2.2 Review of Related Studies 2.2.1 Local Studies Smart Farm: Automated Classifying and Grading System of Tomatoes using Fuzzy Logic This research study by Lenard Dorado aimed to build a computer vision system that classify and grades tomatoes. This uses image processing and fuzzy logic in MATLAB software. Their proposed work is one of the modern ways of farming (smart farm). Tomatoes will be used as variable for experimentation of the automated classifying and grading system. There is a series of process in their project. At first, they captured the images of the tomato and detected the feature using image processing technique. Their system, the fuzzy logic, will determine if the tomato is good or bad. After classifying each tomato, those good ones will be grade based on its level of ripeness. Furthermore, the possibilities of errors will be reduced through the automated classifying and grading system. This research study is limited only to the demonstration of the accuracy and functionality of the system. They tested the accuracy and functionality of the system by comparing the manual method, which is done by the human, to the result done by the system. Their system is limited on the classifying and not on the hardware part, but they recommend it on the further development of the research. In the latter part of their paper, they emphasized more on the explanation of the fuzzy logic and the use of MATLAB, how they were used in their project. (Dorado, Aguila and Caldo, 2016) 9 2.2.2 Foreign Studies Classification of Green Coffee Bean Images Based on Defect Types using Convolutional Neural Network (CNN) This study of Carlito Pinto, Junya Furukawa, Hidekazu Fukai, and Satoshi Tamuraaimed aims to develop a system that automatically detects the defect of green coffee beans in Timor-Leste for their production of coffee. As their initial step, they developed an image processing system wherein it classifies each image of beans based on the type of defect. For the development of the classifier, they used the deep convolutional neural network. The classier succeeded from the accuracy of 72.4% to 98.7% based on the defect type of green coffee beans. The input of the system is the colored pictures of the green coffee beans and the output is the classification of the defect. They labelled the green coffee beans into 6 classes: black, sour, fade, pea berry, damaged, and normal bean. The inputs of the neural network were the data values from the images on the dataset. They placed the green coffee beans into a white paper before they took the photograph of these green beans. They also used a digital camera in automatic mode with the settings: F/16, ISO 200, 1/60 s exposure time, auto focus and placed 1m above the beans. After they took the picture of the front side of the beans, they took also the picture of the back side. With the use of the some image processing techniques, they isolate the beans. They perform scaling and resize each image in 256x256 pixels, and they label them manually. They formed three set of images from the prepared pictures. These are the training set, validation set and 10 test set. The training set was used for the learning of neural networks. In the learning phase of neural networks, validation set was used to check and test the accuracy achieved during training. In evaluating the performance of sorting ability of the neural networks with final parameters the test data was used. This research trained their neural network from scratch, wherein they designed their own neural network. They used different functions and steps to build their convolutional neural network.. (Pinto et al., 2017) Method of Coffee Bean Defect Detection The purpose of the study “Method of Coffee Bean Defect Detection” by Betelihem Mesfin Ayitenfsu is to detect the defect of a coffee bean using image processing. He said that with the help of image processing technique, they can outscore the limit of the human capabilities in inspection and grading of the quality of the green coffee beans. He mainly focused on the size of the green beans and the broken one. In this study, they use machine vision and image processing technique to analyze and grade the coffee bean based on the parameters such as metric value depend on the area and parameter of coffee bean. An algorithm was presented to measure key parameters: area and metric value of coffee bean. In order to detect the defective one, those key parameters are compared to the model parameter. A digital camera model DSC-H10, SONY 8.1 Mega Pixel, was used to capture the images of the coffee beans. They provided a stand for the camera to easily move with respect to the view of the beans. The main goal of the image processing is to detect the roundness, are and parameter using bwboundaries, a boundary tracing 11 routine in MATLAB. They use the MATLAB as their platform to perform this image processing. As an output of the project, the image processing can classify the sample into 2 criteria. From 100 sample of coffee bean 78.32% good and 19.68% of coffee bean damaged and 2% wrong detection. (Ayitenfsu, 2014) Transfer learning using Convolutional Neural Networks for Object Classification within X-ray Baggage Security Imagery The work of Akcay et al, “Transfer learning using Convolutional Neural Networks for Object Classification within X-ray Baggage Security Imagery”, is another deep Convolutional Neural Network (CNN) project, but this time it was done through transfer of learning. This is used to do the object classification within x-ray baggage security imagery. They used the transfer of learning instead of the traditional way which requires a large amount of training data. CNN with transfer learning achieves superior performance compared to prior work. They make use of CNN configuration which won the ILSVRC-2012 competition (AlexNet) having 5 convolutional layers and 3 fully-connected layers, with 60 million parameters, 650,000 neurons. It was trained over ImageNet dataset. They also employ the ILSVRC-2014 winner (GoogleNet), it has many more layers (22) and 12 times fewer network parameters compared to AlexNet. They used the Fine-Tuning Approach to the networks using propagation algorithm with stochastic gradient descent method. They freeze the parameters of certain layers to use in learning new dataset instead of updating them during training. They set the classification into two set: a.) 2 classes (guns vs no guns). b.) 6 classes (firearm, firearm-components, knives, ceramic knives, 12 camera, and laptop). They trained their dataset in varying freeze layers, e.g. Freeze layer 1, Freeze layer 1 and 2, Freeze layer 1, 2 and 3, etc. and evaluated each result to find which setup has the highest accuracy. (Akcay et al., 2016) Fine Tuning CNNs with Scarce Training Data Adapting ImageNet to Art Epoch Classification The objective of this study is to transfer of learning to overcome a problem in limited training data. They performed transfer of learning to create a system that classifies some type of paintings. The researches had a limited amount of data because their topic or their main focus in on paintings. They used the images available in the websites. To be specific, they used the Wikipaintings collection for their source of data. the researchers use the winner of ILSVRC-2012 (AlexNet) which is already trained in ImageNet dataset. The pre-trained CNN model AlexNet remained trained in ImageNet and then fine-tuned in their dataset from Wikipaintings collection. The trained CNN then evaluated and compared to the linear models based on Improved Fisher Encodings. The classifier can classify paintings by its art epoch such as Baroque, Renaissance or Impressionism. (Hentschel, Wiradarma and Sack, 2016) The Effectiveness of Data Augmentation in Image Classification using Deep Learning This study of Jason Wang of Standford University and Luis Perez of Google evaluated solutions in image classification using data 13 augmentation. Cropping, rotating and flipping images were the traditional way of data augmentation techniques which were formed and experimented by different works in the past. They formed small subsets from ImageNet to perform data augmentation technique and evaluated it. They said that one of the successful data augmentation techniques was the traditional way mentioned before. They experiment the use of GANs (Generative Adversarial Networks) to produce images of different look. A method was proposed to let a neural network learn augmentations for a better form of classifier. They call this as neural augmentation. The researchers limit their data into two classes and build those neural network classifiers to correctly recognize the class in order evaluate the effectiveness of augmentation techniques. The researchers trained their small neural network to perform an extraordinary classification. CycleGAN was used for data augmentation of the images by transferring its features to a fixed predetermined image such as night and day theme, or winter and summer. As a final process, they explore and propose a different kind of augmentation process wherein they connect the two neural networks, transfers style and classifies. With that way, their neural network learns augmentations which reduce classification losses. (Wang and Perez, 2017) A New Image Classification Method Using CNN Transfer Learning and Web Data Augmentation This work is done by Dongmei Hana, Qigang Liu, and Weiguo Fan. They proposed a two-phase method combining CNN transfer learning and web data augmentation to solve a problem in a limited training data. With their method, the presentation of the feature in pre-trained neural network 14 can be efficiently transferred to a new target task. They said that their method was not only reduces the big requirement in a large data, but also increase the existing training data. These two methods contribute to the solution in over-fitting of deep CNNs with a small dataset. The method they proposed is composed of two phases, phase one builds a powerful classifier using current training data; phase two focuses on augmenting the dataset with use and help of the classifier developed in first phase. Their solution was applied to six public small datasets and as a result compared to the traditional way; this has a higher and better performance. They said that the results of their experiment prove that their proposed solution will be the great solution to use when encountering problems in deep CNNs on small dataset. The result of their study showed that ResNet achieved the highest accuracy among all the state-of-the-art models using the six small datasets. (Hana, Liu and Fan, 2017) Convolutional Neural Network Transfer Learning for Robust Face Recognition in NAO Humanoid Robot This study evaluates the two well-known CNN architectures, AlexNet and VGG-Face, for face recognition task. They apply transfer learning to the pre-trained networks to perform recognition. Their face recognition framework requires only one example image per person to achieve accurate face recognition. Their proposed face recognition framework was then implemented to the humanoid robot known as NAO to test the practicality and flexibility of their algorithm and in a practical environment. The NAO’s low resolution camera and a separate highresolution camera were utilized to obtain the experimental results. This 15 results to the excellent recognition of a new person from a single example image under varying distance and resolution. They retrained the AlexNet on the CASIA-WebFace database; this is to perform the transfer learning in the said architecture. The database consists of half a million face of a celebrity images in a total of 10575 unique identities. They resized the images to fit to the input layers of the CNN. The AlexNet was trained using the stochastic gradient descent (SGD) with initial learning rate of 0.001. The VGG face remained as it is because it is already trained for face recognition. The result of their study showed that VGG-face is much accurate than AlexNet. But this study showed that transfer learning can be used to accomplish a real-time face recognition task. They concluded also that the resolution of the image doesn’t have a great impact on the performance. (Bussey et al., 2017) A Machine Vision based Pistachio Sorting Using Transferred MidLevel Image Representation of Convolutional Neural Network This study aimed to build a computer vision system that separates the open-shell pistachios to those defective ones as well as trashes. The images of pistachios and some trashes like branches or twigs were fed on the new model using a support vector classifier. They used the Canon 600D camera to capture the images of pistachios. The images were taken in four different lighting conditions, and dark background to visualize as it is located in a conveyor belt. The pistachios were scattered. Each image contains multiple numbers of objects with a resolution of 5184 x 3456 and cropped in 400 x 400 RGB image. They produced 1000 unique images. After performing image augmentation they produced 20000 images. 16 Scaling, rotating, and lighting conditions were the used augmentation techniques. They used image segmentation to detect each object individually. They performed Canny edge detection followed by active contour fitting. In their study, they used the two winner of ILSVRC: AlexNet and GoogleNet, to perform transfer learning as a feature extractor. They used the MATLAB 2017 as their working platform. The Linear support vector machine was used for classifying the data to the desired output of the system. As the result, they got 99% of accuracy for the transferred weights on the GoogleNet and 98% on the AlexNet. (Farazi, Zadeh and Moradi, 2017) A Robust Deep-Learning-Based Detector for Real Time Tomato Plant Diseases and Pest Recognition This research study aimed to find the more suitable deep learning architecture combined with deep feature extractors to have an accurate and faster detection of diseases and pests in tomato plants. The researchers focused mainly on the identification and recognition of disease and pests affecting the tomato plants. The use of meta-architecture based on deep detectors aimed to identify the Region-of-Interest in the image. The detectors used in this study are Faster Region-based Convolutional Neural Network, Region-based Fully Convolutional Network and Single Shot Detector combined with deep feature extractors including VGG net and Residual Network. All gathered images were captured under different conditions and scenarios using camera devices with various resolutions. The dataset used by the researchers consists of about 5,000 images gathered from the different farms located in Korean Peninsula. The dataset 17 were manually annotated the areas of tomato images having diseases and pests by marking a bounding box and placing its class. Data augmentation technique such as flipping, rotation and cropping of image is also used to increase the number of dataset. The dataset has been divided into 80% training set, 10% testing and 10% validation set. The system is trained and tested with an Intel Core I7 with 3.5 GHz processor and two NVidia Geforce Titan X GPUs. The whole performance of the system has a mean AP of more than 80% for the best cases. (Alvaro Fuentes, 2017) 2.3 Synthesis In a local study entitled “Smart Farm: Automated Classifying and Grading System of Tomatoes using Fuzzy Logic” (Dorado, Aguila and Caldo, 2016), the authors focused on the classifying and grading the object only and not on developing such a machine that sorts. Their goal, to classify, is similar to the goal of this paper. These both papers focused not on developing the sorting machine but developing such system that classify. The previous study has a series of system: first is to classify the good and bad, and the other one is to grade each good based on the ripeness. This study of Dorado used MATLAB and Fuzzy logic while this paper used Python as the programming language and Deep learning. The study (Pinto et al., 2017) which is published by IEEE entitled “Classification of Green coffee bean images based on defect types using convolutional neural network (CNN)” is similar to this paper for having the similar purpose, the classifying the green coffee beans. Even though both studies focused on coffee green beans, there were differences when it comes to the dataset used. The study (Pinto et al., 2017) classify the 18 green coffee beans based on the defect that leads on a 6 output, the normal also included; this paper focused only on the 2 class: the defective and the normal bean. Our study mainly focused on Barako Coffee green beans, which are exclusive in tropical countries. The previous study trained their deep neural network from scratch and successfully increased the accuracy of the system while this paper is done through transfer of learning. Some techniques in the previous study were made as our reference such as the techniques in capturing the images of the green coffee beans to produce a dataset. The study “Method of Coffee Bean Defect Detection” (Ayitenfsu, 2014), is similar to this research paper for having a same number of output classes. The both papers didn’t focus on the specific type of defect but the study of Ayitenfsu focused only on the roundness and area of the bean. Even though this research paper didn’t focused on the type of defect, it can detect all types of defect as one class even if some defective bean has a standard area. This study of Ayitenfsu used MATLAB as the platform in perform image processing. Relating to this paper, some techniques in capturing images was monitored and used. Not relating to the goal of this paper, which is classifying green coffee beans, the study “Transfer learning using Convolutional Neural Networks for Object Classification within X-ray Baggage Security Imagery” (Akcay et al., 2016) was use as reference for transfer of learning and convolutional neural network. They have different application but the same tools were used. The prior research focused on two models, the AlexNet and GoogleNet, while on this paper, the top models which perform the highest among others were evaluated to achieve the best architecture that suits on the task given. Some techniques in the previous study were used as reference to achieve a greater result in this paper. 19 The work of (Hentschel, Wiradarma and Sack, 2016) “Fine Tuning CNNs with Scarce Training Data – Adapting ImageNet to Art Epoch Classification” helped this study to perform transfer of learning using fine tuning. With the limited source for dataset, the said work of Hentschel et al greatly contribute on how to improve this research study very well. In the research study of Perez and Wang, “The Effectiveness of Data Augmentation in Image Classification using Deep Learning” (Wang and Perez, 2017), data augmentation was the most concerned. In order to prevent overfitting, data augmentation was also done in this paper. The researchers used some data augmentation techniques cited in the previous study. This can be used to increase the amount of training dataset needed in this study. The study of Hana, Liu, and Fan entitled “A New Image Classification Method Using CNN Transfer Learning and Web Data Augmentation” (Hana, Liu and Fan, 2017) evaluated the state-of-the-art deep CNN models to find out which models can be used in the task given. The same with this paper, trsaining and evaluating the state-ofthe-art deep CNN models were done. Both studies were done through transfer of learning and augmentation; the difference were this previous study trained in many dataset while in this research there is only one dataset, the green coffee beans. The data augmentation was also different because the study of Hana, Liu, and Fan is web data augmentation while on ours; augmentation is just a traditional way. The “Convolutional Neural Network Transfer Learning for Robust Face Recognition in NAO Humanoid Robot” (Bussey et al., 2017), is related to this research study because this two evaluated the state-of-the-art AlexNet model in a specific task. Both studies also do transfer learning to fit on the task given. Even though this study is not limited to one model, the researchers performed transfer learning in all of this. Unlike 20 the previous work which only performed this task in one model and compared it to the existing one that was trained in task already. The techniques in performing transfer learning, such as fine tuning, in the previous study was made as the reference for this paper to perform also the said method. The study that sorts pistachio “A Machine Vision based Pistachio Sorting Using Transferred Mid-Level Image Representation of Convolutional Neural Network” by (Farazi, Zadeh and Moradi, 2017) performed transfer learning for feature extraction and SVM for the classification task, while this study aimed to build a classifier using deep learning or transfer learning to be specific. This study used the pre-trained CNN for both task. The study (Farazi, Zadeh and Moradi, 2017) is similar to (Ayitenfsu, 2014) and (Dorado, Aguila and Caldo, 2016) for using the MATLAB as their platform but this paper is different because the researchers used the Python language. This has much support and has developing partners unlike MATLAB which is limited to their source only. Image augmentation techniques like rotating and scaling were used in this paper. The study entitled “A Robust Deep-Learning-Based Detector for Real-time Tomato Plant Diseases and Pest Recognition” is related to this research study since it also used deep meta-architectures such as Faster RCNN and Single Shot Detector combined with feature extractors including Inception V2, Resnet-50, Mobilenet V1 and Mobilenet V2. Both studies were implementing a robust deep-learning based detector using images captured in complex scenarios. 21