Uploaded by ahalya.g0597

Project Proposal

advertisement
Group-9 Project Proposal
I.
Dataset:
We are using the New Plant Diseases Dataset:
https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset
This dataset consists of about 87K RGB images of healthy and diseased crop leaves and
has 38 different classes. The total dataset is divided into 80/20 ratio of training and
validation set preserving the directory structure.
Goal: Image dataset contains different healthy and unhealthy crop leaves. There are 38
classes out of which we will select 10 classes to examine our proposed algorithm.
II.
Related Works:
We have referred to five academic papers related to Plant/Leaf disease classification
using K Nearest Neighbor. Below are our findings from the papers:
Paper 1: Coffee Plant Disease Classification Using KNearest Neighbor
Author: Muhammad Alif Naufal Yasin, Wikky Fawwaz Al Maki
Link: https://ieeexplore.ieee.org/document/9914843
Researchers in this paper 1 aim at the dataset, technique/algorithm as detailed
below:
Dataset: This dataset has images of arabica coffee leaves symptoms divided into four
classes, namely healthy, rust, phoma, and Cercospora, with1669 images in total. The
dataset in this paper contains 2209 images of arabica coffee leaves symptoms divided
into five classes but only four classes have been used.
Summary: Firstly, image resizing is performed first before the classification process.
Then, by using Color Moments and GLCM, feature extraction to the images is
performed. Then the classification procedure is carried out using KNN. KNN is used as a
classification algorithm due to its simplicity of implementation and high performance.
Techniques used:
 Preprocessing data: Image resizing is done to 100 x 100 pixel to maintain
consistency in size of all photos.
 Feature extraction: After resizing image feature extraction is done. Color
Moment and GLCM are used for feature extraction where in color moment the
image color space will be changed to RGB and YCrCb. The feature extraction
results are used as a model. The model is then used to classify the data test using
KNN, resulting in an accuracy value. For GLCM, the resized image's color space

is changed to grayscale and then GLCM is used for feature extraction. The
classification is then performed using KNN from various angles.
KNN Classification:
a) KNN with color moment: By using Color Moment with two color spaces
and k value from 1 to 15. It is found that the classification method with
YCrCb color space gets the highest accuracy, which is 82.3% in k=9.
b) KNN with GLCM: The classification is then performed using KNN from
various angles. An experiment is conducted using a single degree and a
combination of two degrees by which we get 61%accuracy.
c) KNN with Color Moment (RCB)+GLCM: To improve accuracy even
further, two feature extraction methods: Color Moment and GLCM are
combined. Color Moment combined with the RGB color space and GLCM
gives 74.2 % accuracy and Color Moment combined with the YCRCB
color space and GLCM gives 81.3% accuracy. After experimenting with
different values of k with combination of color moment of YCrCb color
space and GLCM highest accuracy of 83.5% is obtained.
Paper-2: Disease Detection in Plants Using KNN Algorithm
Author: Surbhi garg, Divya Dixit, Sudeept Singh Yadav
Link: https://ieeexplore.ieee.org/document/10074491
Researchers in this paper aim at the dataset, technique/algorithm as detailed below:
Dataset: Dataset used for is the Plant Village public set of data for plant disease detection.
The dataset contains 87000 RGB image snips of healthy and unhealthy plant leaves which
contains 38 classes out of which 25 classes are considered for examination.
Techniques used:
 Pre-Processing: To decide which disease to detect, the image has been used as
input. For feature extraction, the image is transformed to grayscale.
 Segmentation: To divide the image as input into sections, the k-mean clustering
method is used.
 Feature Extraction: The GLCM algorithm has been used in the second phase to
extract the feature image and store them with in database. is a tabular description
that indicates the number of times different permutations of pixel intensity values
happen within an image. This algorithm determines the total number of pixels
within the image matrix. The computed pixels in the image matrix are saved.
Compare the likeness of pixels in the matrix using the histogram procedure. The
dissimilarity factor from the matrix is determined. And the elements are normalized
by dividing the pixels.
 Classification: Once the features are extracted, knn algorithm is used for dividing
the classes and finding the disease according to the image.
Accuracy of 93% is achieved in finding the disease of a plant.
Paper-3: Plant Leaf Disease Recognition Using Random Forest, KNN, SVM and CNN
Authors: Bijaya Kumar Hatuwal, Aman Shakya, and Basanta Joshi
Link: https://www.researchgate.net/profile/BijayaHatuwal/publication/351708837_Plant_Leaf_Disease_Recognition_Using_Random_Fore
st_KNN_SVM_and_CNN/links/60a5d05092851c43da02c7d5/Plant-Leaf-DiseaseRecognition-Using-Random-Forest-KNN-SVM-and-CNN.pdf
Researchers in this paper 3 aim at the dataset, technique/algorithm as detailed below:
Dataset:
The study used images in jpg format which have various plant species and diseases. These
images were sourced from the Kaggle Plant Village dataset. The dataset is divided into
‘train’ folder which has images for training ML models, ‘Valid’ folder has images for model
validation. The split ratio is 80% data for training and 20% for testing. The dataset covered
various categories as shown below.
Techniques Used:
In this study, ten properties are extracted from color and textures as features from the
images.



The mean and standard deviation of each color channel (red, green, and blue) are
calculated for each image and image pre-processing is performed to reduce noise
levels in the images, they are first converted into grayscale, and then blurring is
applied. The study employs the Haralick texture features algorithm to extract
texture-based features from the grayscale images. These features include contrast,
correlation, entropy, and inverse difference moments.
The study uses these color and texture features as input for machine learning
models like K-nearest Neighbors (KNN), Support Vector Machine (SVM), and
Random Forest Classifier (RFC) to predict and classify plant diseases from
images.
In this process, the file path of image is provided as input and feature extraction is
done for predictions.
KNN algorithm is used for both classification and regression. In KNN model, a
value of k =5 is selected, which gave an accuracy of 76.96%, though highest
accuracy is at k =1, it is not considered to prevent over-reliance on a single
nearest neighbors vote for prediction, and elbow criterion plot is also made to
determine the mean error. The weighted average value for precision, recall, f1score, and support are 0.78, 0.77, 0.77 and 5914 respectively for testing images
for KNN.


The random forest model with accuracy of 87.436% is created with 250 numbers
of estimators.
The Convolutional Neural Network model has training accuracy of 97.89% and
SVM produced accuracy of 78.61%. Among all the given models CNN produced
the highest level of accuracy.
Conclusion: Future work in this research domain involves expanding dataset to include a
wide variety of plant species with various textures and diseases. The optimization of
hyperparameters in the various machine learning models can be enhanced through
techniques like grid search or other algorithms, allowing for more efficient model tuning
and potentially leading to better predictive performance.
Paper-4: Review on Emerging Trends in Detection of Plant Diseases using Image
Processing with Machine Learning
Authors: Punitha Kartikeyan, Gyanesh Shrivastava
Link:https://www.researchgate.net/publication/348541626_Review_on_Emerging_Trend
s_in_Detection_of_Plant_Diseases_using_Image_Processing_with_Machine_Learning
Researchers in this paper 4 aim at the dataset, technique/algorithm as detailed below:
Dataset:
This paper has discussed an overview of various papers and classification techniques. One
of the papers it referred is "A Color and Texture Based Approach for the Detection and
Classification of Plant Leaf Disease Using KNN Classifier," which used 237 leaf images
sourced from the Arkansas plant disease database. Another paper is "Recognition of
diseases in paddy leaves using knn classifier," which used 300 images of diseased paddy
plants.
Techniques used:
The detection of plant disease involves five major steps like Image Acquisition, Image
pre-pre- processing, Image Segmentation, feature extraction and classification.
Different techniques used for the classification of plant disease using various classifiers
such as Support Vector Machine, Artificial Neural Network, K-Nearest Neighbors, and
other classifier methods have been discussed.
K-Nearest Neighbors
Researchers have used KNN for spotting plant diseases. For instance, Xu et al. used it to
find issues like not having enough Nitrogen or Potassium in Tomato plants. They used
various techniques, including Fourier transform for texture and LAB color space for
colors. They picked the best features using a genetic algorithm and applied a fuzzy
version of KNN. This system had an accuracy of over 82.5% in diagnosing these plant
problems.
a) In one of the studies by Suresha et al., they focused on Paddy plant diseases, such
as Blast and Brown Spot, using about 300 images of affected paddy plants. They
employed image segmentation techniques to separate diseased areas from healthy
ones and extracted features related to the shape of the affected regions. By
applying KNN, they achieved a disease classification accuracy of 76.59%.
b) The other study by Hossain et al. targeted various plant diseases, including
Alternaria Alternata, Anthracnose, Bacterial Blight, Leaf Spot, and Canker. They
used 237 leaf images from the Arkansas plant disease database, applying
segmentation to isolate the diseased regions. Feature extraction was done using
the Gray-Level Co-occurrence Matrix (GLCM), and they utilized 5-fold crossvalidation to avoid overfitting. This approach resulted in a high disease detection
accuracy of 96.76%.
c) Arivazhagan et al. adopted an image analysis technology for disease identification
in plant leaves, achieving a remarkable accuracy rate of 94.74%.
Al-Hiary et al. employed Otsu segmentation and K-means clustering to identify
plant and stem diseases, utilizing color co-occurrence for feature extraction. Their
approach resulted in a robust technique with precision values ranging from 83%
to 94%.
Conclusion and Future Work:
Among various classification techniques, SVM and ANN methods have been widely
recognized for their high accuracy in plant disease detection. Future advancements may
involve the development of hybrid algorithms integrating genetic algorithms, ant colony
optimization, cuckoo optimization, and particle swarm optimization with SVM, ANN,
and KNN, promising enhanced efficiency in disease detection. Mobile applications with
built-in remedial solutions could empower farmers to easily detect various plant issues,
from leaf and stem diseases to nutrient deficiencies.
Paper-5: Rice Leaf Disease Image Classifications Using KNN Based on GLCM
Feature Extraction
Author: R A Saputra, Suharyanto, S Wasiyanti, D F Saefudin, A Supriyatna, A Wibowo
Link: https://iopscience.iop.org/article/10.1088/1742-6596/1641/1/012080/pdf
Researchers in this paper 5 aim at the dataset, technique/algorithm as detailed below:
Dataset: This dataset has120 images of rice leaf disease from the UCI repository. This
paper's dataset determines how to classify images of rice leaf disease consisting of three
diseases, namely Bacterial leaf blight, Brown spot, and Leaf smut.
Summary: Initially, feature extraction for text analysis is done GLCM method with five
feature values consisting of contrast, energy, entropy, homogeneity, and correlation. Then
the classification procedure is carried out using KNN. KNN is used as a classification
algorithm due to its simplicity of implementation and high performance.
Techniques used:
Feature extraction: GLCM is used for feature extraction for texture analysis. where the
matrix will calculate the probability value of the results of the relationship between two
pixels with a certain intensity in the distance and orientation of a certain angle in the image.
The two-pixel coordinates have d distanced and θ angle orientation. The feature extractions
such as Contrast, Energy, Entropy, Homogeneity, Correlation are calculated.
KNN Algorithm: After feature extraction, the data set will be divided into 10 parts using
ten-fold cross- validation, the data in the first part becomes the testing data and training
data.
Measurement of accuracy of algorithms: In this test, the confusion matrix is used as a
measure of the performance of the KNN algorithm. Using the different values from
confusion matrix, the accuracy and kappa values of an algorithm model can be calculated.
Conclusion: GLMC for feature extraction and KNN algorithm is used for the classification
of rice leaf disease, by finding the maximum k value from the experiment k value 1 to 20.
The results of our experiments show that the value of k = 11 has the highest accuracy value
compared to other k values of 65.83% and kappa 0.485.
III.
From your readings, summarize the techniques that can be applied to your dataset
highlighting the pros and cons for each.
We would be using KNN algorithm and GLCM algorithm. GLCM is used for feature extraction of
the leaves and KNN algorithm will be used for classification of the leaves into various classes.
Advantages for considering KNN algorithm:
 KNN performs well in multi-class classification problems, making it a suitable
choice for image datasets with multiple categories or classes.
 It is adaptive, which means it can adapt to different datasets with complex and nonlinear patterns.
 KNN provides transparency in its decision-making process. You can visualize and
interpret the classification decisions by examining the k-nearest neighbors of a test
sample.
Advantages for using GLCM for feature extraction:

GLCM is highly effective in capturing texture information within an image. It can
characterize the relationships between pixel values, allowing it to distinguish
between various textures, patterns, and structures.


IV.
It focuses on local image regions and is extremely sensitive to small-scale
variations in texture, which makes it ideal for applications where detailed local
texture analysis is needed.
It supports various programming languages and libraries, making it flexible to
access.
Methodology
Implementation will be done in Jupyter notebook (python) using libraries like pandas,
numpy, sklearn, matplotlib, etc.,
Below methodology needs to be followed stepwise:
 Pre-Processing: Initially, the image needs to be resized to maintain the consistency
in size for all photos.
 Feature Extraction: Later, we will move to Feature extraction stage where it
identifies the most discriminating characteristics, which a machine learning
algorithm can more easily consume. We use GLCM technique to achieve it.
 Classification: We use KNN algorithm for classification which is used for dividing
the leaf classes into different classes and help in finding the disease according to
the features extracted.
Download