DEPARTMENT OF COMPUTER SCIENCE FISH IDENTIFICATION SYSTEM By Diego Mushfieldt A mini-thesis submitted in partial fulfilment of the requirements for the degree of BSc. Honours Supervisor: Mehrdad Ghaziasgar Co-Supervisor: James Connan Date: 2011-11-02 Acknowledgements I would like to take this opportunity to thank my parents for their constant motivation and inspiration and for granting me the opportunity to study at the University of the Western Cape. I would also like to thank those dear to my heart including my family for always having the patience and understanding when I could not always be in their presence. Last, but certainly not least, I want to thank my supervisors, Mehrdad Ghaziasgar and James Connan, for believing in me and giving me the confidence I needed to keep on working so hard at this project. Your dedication is truly an inspiration to me and to other students at UWC. ABSTRACT Aquariums currently display a large number of various kinds of fish and visitors regularly desire to know more about a particular kind of fish. Visitors can currently obtain this information by asking either an expert or by scanning through the documentation in and around the aquarium. However, information may not be readily available at times. Therefore, it is desirable to create a system that allows such information to be readily available in an interactive manner. The project aims to develop a system that uses a video stream of a wide range of fish as its input. The user then clicks on a particular fish and the system then classifies the fish and displays information about the fish accordingly. The system aims to give users an enjoyable and educational experience by allowing them to interact with the system via the click of a mouse. This project is scalable enough to incorporate functionality such as a touch screen in order to improve interaction and enhance the user experience. Table of Contents CHAPTER 1 ........................................................................................................................................ 6 INTRODUCTION ............................................................................................................................. 6 1.1 Computer Vision and Image Processing ..................................................................................... 6 1.2 OpenCV (Open Computer Vision) ............................................................................................. 6 1.3 Current Research ........................................................................................................................ 7 CHAPTER 2 ........................................................................................................................................ 8 USER REQUIREMENTS ................................................................................................................... 8 2.1 User’s view of the problem ......................................................................................................... 8 2.2 Description of the problem ........................................................................................................ 8 2.3 Expectations from the software solution .................................................................................... 9 2.4 Not expected from the software solution .................................................................................... 9 CHAPTER 3 ...................................................................................................................................... 10 REQUIREMENTS ANALYSIS ........................................................................................................ 10 3.1 Designer’s Interpretation and Breakdown of the Problem ........................................................ 10 3.2 Complete Analysis of the Problem............................................................................................ 11 3.3 Current Solutions ..................................................................................................................... 12 3.4 Suggested Solution.................................................................................................................... 14 CHAPTER 4 ...................................................................................................................................... 15 USER INTERFACE SPECIFICATION (UIS) ................................................................................. 15 4.1 The complete user interface ...................................................................................................... 15 4.2 The input video frame .............................................................................................................. 16 4.3 How the user interface behaves ................................................................................................ 16 CHAPTER 5 ...................................................................................................................................... 17 HIGH LEVEL DESIGN (HLD)....................................................................................................... 17 5.1 Description of Concepts ........................................................................................................... 18 5.2 Relationships between objects .................................................................................................. 19 5.3 Subsystems of HLD ................................................................................................................. 19 5.4 Complete Subsystem ................................................................................................................ 20 CHAPTER 6 ...................................................................................................................................... 21 LOW LEVEL DESIGN (LLD) ......................................................................................................... 21 6.1 Low Level Description of Concepts ......................................................................................... 21 6.2 Detailed Methodology .............................................................................................................. 22 CHAPTER 7 ...................................................................................................................................... 32 TESTING AND RESULTS .............................................................................................................. 32 CHAPTER 8 ...................................................................................................................................... 34 USER MANUAL ............................................................................................................................... 34 8.1 Starting the System ................................................................................................................... 34 8.2 Load Video............................................................................................................................... 35 CHAPTER 9 ...................................................................................................................................... 36 CODE DOCUMENTATION .......................................................................................................... 36 CHAPTER 1 INTRODUCTION 1.1 Computer Vision and Image Processing Computer vision [1] is the study of techniques that can be used to make machines see. In this context ’see’ refers to a machine that is able to extract information from an image that is necessary to solve some task. The image can take many forms, such as video sequences or even views from multiple cameras. Image processing [2] is basically any kind of signal processing whereby the input for the processing is an image and the output is either another image or a set of parameters related to the image. 1.2 OpenCV (Open Computer Vision) OpenCV [3] is an open source computer vision library which is written in C and C++. OpenCV runs on the following platforms: Linux, Windows and MacOS. This library helps people to build complicated vision applications with its simpleto-use vision infrastructure. Its ease-of-use can be experienced when using its library which contains over 500 functions. OpenCV also contains a machine learning library since computer vision and machine learning go hand in hand, so it can be used to solve any machine learning problem. 1.3 Current Research Information about specific fish within an aquarium is not always readily available at times. At the moment people can obtain information either by scanning the documentation in the aquarium or ask an expert. Therefore, it is desirable to develop a system that provides instant information about specific fish in an interactive manner. The proposed system identifies a fish, using OpenCV’s libraries, by creating an image of the fish when the user clicks on it with a mouse. The image is processed using various algorithmic techniques and the necessary information is then displayed on the screen. CHAPTER 2 USER REQUIREMENTS The following section describes the problem from the user’s point of view. It is critical to gather information from the user in order to produce a meaningful solution. 2.1 User’s view of the problem The user requires the system to provide an easy mechanism for a user to select a fish (in this case via a click of a mouse) on a live or pre-recorded video stream. The system should be able to classify the fish, while it is in motion within the video stream, when it is clicked on. The system should also be capable of providing the user with information that is structured in a sensible manner and which is easy to understand. It is very important to consider how user friendly the system should be in order to present the information with clarity so that the provided information remains unambiguous. 2.2 Description of the problem The main purpose of this project is to develop an interactive system that is capable of providing instant feedback about a particular fish which the user is interested in. The system should be able to assist in educating its users about different fish species by presenting certain facts about the particular type of fish that the user is interested in. 2.3 Expectations from the software solution The system is expected to classify one fish at a time when the user clicks on it in the live video stream. The focus of this project is not only on classifying the fish, but also on the kind of information displayed to the user as well as the manner in which information is displayed. 2.4 Not expected from the software solution The system is not expected to display information about more than one fish at the same time. Therefore, the system can only process and perform analysis on one fish at a time. Since one camera is used to capture the fish, the system can only process the fish in two dimensions with only one camera angle. Therefore the system is not expected to do its processing in three dimensions. CHAPTER 3 REQUIREMENTS ANALYSIS The following section describes the problem from the designer’s perspective and uses the previous chapter (CHAPTER 2) as a starting point. 3.1 Designer’s Interpretation and Breakdown of the Problem The aquarium hosts many visitors each year and there are various fish on display. However, the viewers are not always able to obtain instant information of specific fish which they are interested in learning more about. The input to the final system is a live or pre-recorded video stream. Using a live video feed rather than a recorded video file is ideal and more practical but difficult to implement in terms of the efficiency of algorithms and number of frames per second. A camera is used to capture/record the fish while it is swimming. This allows the user to observe the fish and decide which particular fish he/she is interested in learning more about. The user interacts with the system by moving the mouse cursor over a specific fish and by clicking on it. The location of the click is used to determine which fish was clicked on by the user and an image of that fish is created. The system uses image processing techniques and functions from the OpenCV (Open Computer Vision) libraries in order to classify the fish accordingly. The system then displays the necessary output on the screen. The difficulty lies in processing the image to determine what fish was clicked on and this should be done fast enough to ensure real-time processing. 3.2 Complete Analysis of the Problem 3.2.1 Recording the fish in real-time A camera is used in order to record the fish swimming in the fish tank. This input from the camera is used by the system in order to display the live video feed of the fish in two dimensions on a screen, and to enable the user to select a particular fish via the click of a mouse. 3.2.2 Processing the image of the selected fish After the user clicks on a specific fish, the location of the click is used by the system in order to determine which fish was clicked on. The region of interest (ROI) will be of importance in order to achieve this goal. 3.2.3 Displaying information to the user The system is not required to display critical biological features of each fish. However, the information should be precise enough in order to educate the user but concise enough to ensure that the user does not feel overwhelmed by too much information. Therefore, the system only displays the fish type. 3.3 Current Solutions There are a few systems which are similar to this one. They are used to do fish surveys as well as counting fish in order to protect marine ecosystems. However, there is currently no system that exists which solves the exact same problem as the proposed system. The similar systems mentioned earlier do make use of similar functions and techniques that are necessary to solve the problem such as the Fish Identification System. Some of these systems are described below. 3.3.1 Real-time fish detection based on improved adaptive background [4] This system is a kind of fish behaviour monitoring system. The system proposes a new approach to update the background, which is based on frame difference and background difference, in order to detect the fish in a real-time video sequence. The system combines the background difference and frame difference to update the background more correctly and completely using shorter computation times. 3.3.2 Recognizing Fish in Underwater video [5] This system uses a deformable template object recognition method for classifying fish species in an underwater video. The method used can be a component of a system that automatically identifies fish by species, improving upon previous works, which only detect and track fish. In order to find globally optimal correspondences between the template model and an unknown image, Distance Transforms are used. Once the query images have been transformed into estimated alignment with the template, they are processed to extract texture properties. 3.3.3 Field Programmable Gate Array (FPGA) Based Fish Detection Using Haar Classifiers [6] The quantification of abundance, size and, distribution of fish is critical to properly manage and protect marine ecosystems and regulate marine fisheries. This system is designed to automatically detect fish using a method based on the Viola and Jones Haar-like feature object detection method on a field programmable gate array (FPGA). This method generates Haar classifiers for different fish species by making use of OpenCV’s Haar Training Code which is based on the Viola-Jones detection method. This code allows a user to generate a Haar classifier for any object that is consistently textured and mostly rigid. 3.4 Suggested Solution The system will work effectively at classifying various types of fish. The suggested solution is easy to modify such that additional functionality can be added to it when necessary. It is also cost-effective since only one camera is used as well as opensource software (OpenCV). CHAPTER 4 USER INTERFACE SPECIFICATION (UIS) The following section describes exactly what the user interface is going to do, what it looks like and how the user interacts with the program. 4.1 The complete user interface The complete user interface is a Graphical User Interface (GUI). Text commands are not used by the user to interact with the system. The figure below shows the user interface as it appears to the user. Figure 1: User Interface 4.2 The input video frame Once the system starts running, the video feed will be displayed to the user within a window on the screen as shown above. The user can now click on any fish within this window. 4.3 How the user interface behaves The system will display the video feed once it is executed. It then waits for input from the user via the click of a mouse. If the user clicks on a fish, the system will respond by displaying an additional window that shows the classification of the fish. Figure 2: Behaviour of User Interface CHAPTER 5 HIGH LEVEL DESIGN (HLD) In this section a HLD view of the problem will be applied. Since the programming language of choice is C/C++; Object Orientated Analysis is not being applied. A very high level of abstraction of the system is constructed as we take a look and analyse the methodology behind the construction of the system. 5.1 Description of Concepts Consider the system objects and their corresponding descriptions in the following table: OBJECT DESCRIPTION Ffmpeg is free and provides libraries for handling Ffmpeg multimedia data. It is a command line program used for transcoding multimedia files. OpenCV is a library of programming functions OpenCV mainly aimed at real time computer vision and is focused mainly on real-time image processing. BGR2HSV This is the simplest method of image segmentation. This technique is used to create a binary image in Adaptive Threshold which there exist only black and white pixels. However, this method includes ways to adapt to dynamic lighting conditions. The ROI is set around the fish to segment it, since Region of Interest (ROI) it is only the fish that is of interest, not the entire image. The coordinates of the user’s click is used to set the ROI. The edge pixels are assembled into contours. The Contour Detection largest contour is detected and is the only contour that is used to represent the final shape of the fish. The Histogram represents the distribution of Histogram colour within an image. A SVM is used to recognize patterns regarding the intensity of the pixels and is used to classify which Support Vector Machine (SVM) class a certain pixel belongs to and makes its decision based on the data analysis. The ROI as well as the Histogram values are sent to the SVM for training and testing the system. Table 1: System Objects and their descriptions 5.2 Relationships between objects The figure below depicts the relationships between the objects: Figure 3: Object Relations 5.3 Subsystems of HLD Figure 4: Subsystems 5.4 Complete Subsystem The figure below shows the high level design and its subcomponents which include more detail about the subsystem. Figure 5: Complete Subsystem CHAPTER 6 LOW LEVEL DESIGN (LLD) In this section explicit detail of all data types and functions will be provided. Pseudo code will also be provided as well as all the aspects of the programming effort without resorting to actual code. 6.1 Low Level Description of Concepts CLASS ATTRIBUTES BGR2HSV cvCvtColor (bgrImg, hsvImg, CV_BGR2HSV) cvAdaptiveThreshold (hsvImg, hsvThresh, Adaptive Threshold CV_ADAPTIVE_THRESH_MEAN_C, CV_ADAPTIVE_THRESH_GAUSSIAN_C, 139, 0) Region of Interest (ROI) cvResize (hsvThresh, threshROI) cvFindContours (threshROI, storage, Contour Detection &first_contour, sizeof (CvContour), CV_RETR_CCOMP) Draw Histogram DrawHistogram (histImg) Table 2: Low Level view 6.2 Detailed Methodology This section will emphasise the methodology used to create this system by analysing the detail of each component. 6.2.1 Video feed The figure below depicts how the video feed is captured. The consecutive frames make up the video which is displayed on the user’s monitor. This is illustrated below in Figure6. Figure 6: Video feed 6.2.2 Processing starts once the user clicks on the fish. Figure 7: User clicks on fish 6.2.3 BGR (Blue, Green, Red) to HSV (Hue, Saturation, Value) Once a frame is captured from the video feed, it is converted from RGB to HSV. This is done because the HSV colour space is not as sensitive to dynamic lighting conditions as BGR. Figure 8: Convert BGR to HSV 6.2.4 Adaptive Threshold This method takes individual pixels and marks them as object pixels if their value is greater than some threshold value and as background pixels if they are less than some threshold value. The resulting image is a binary image which consists only of black and white pixels. The most important part is the selection of the threshold value. In this system the threshold value is manually selected using trial and error to observe which value removes the most noise. The Hue component of the HSV colour space is used and then an adaptive threshold is applied to the single-channel image. Figure 9: Adaptive Threshold 6.2.5 Region of Interest This method uses x and y coordinates to set borders around the object of interest (the fish). The larger image is then cropped in order to do further segmentation on a smaller image, in which only the fish is displayed. Figure 10: Set ROI 6.2.6 Contour Detection and Flood Fill This method computes contours it finds from a binary image. In this case the binary image is the threshold image in which the image edges are implicit as boundaries between positive and negative regions. The largest contour, which is the shape of the fish, is extracted to remove background noise and it is then filled with white pixels to represent the shape of the fish. Figure 11: Contour Detection and Flood Fill 6.2.7 Histogram The Histogram values are used to represent the colour distribution of the fish. The Figure below illustrates how dominant ‘orange’ is within the RGB image. Figure 12: Draw Histogram 6.2.8 Sending the Shape and Colour representations to the Support Vector Machine (SVM) [7] Since the shape and colour distribution of the fish is determined, this data is combined and sent to the SVM. Each fish is given a label (e.g. fish A has label 1 and fish B has label 2…, etc.) and the corresponding features (shape and colour) are combined for each label. The SVM then trains the system to recognize all the fish that is clicked on, each fish having its own unique set of features. 6.2.8.1 SVM Cross-validation and Grid-search [7] The RBF kernel nonlinearly maps samples into a higher dimensional space so it, unlike the linear kernel, can handle the case when the relation between class labels and attributes is nonlinear. [4] The two parameters, C and γ, are the parameters used in the RBF kernel. Some form of model selection needs to be done in order to decide which C and γ are the best for a given problem. The aim is to choose a good C and γ in order to accurately predict testing data. In v-fold cross-validation, the training set is divided into v subsets of equal size. One subset is sequentially tested using the classifier which is trained on the remaining v-1 subsets. Therefore, each instance of the whole training set is predicted once so the cross-validation accuracy is the percentage of data which are correctly classified. This kind of cross-validation procedure can prevent the overfitting problem. The figure below illustrates the overfitting problem whereby the classifier overfits the training data. In contrast, the classifier shown in Figure14 below, does not overfit the training data and gives better cross-validation and as well as testing accuracy. Figure 13: Better Classifier Figure 14: Overfitting problem A “grid-search” is recommended on C and γ using crossvalidation. Different pairs of (C, γ) values are tried and only the pair with the best accuracy is chosen. In order to identify good parameters, exponentially growing sequences of C and γ are tried; for example, C = 2-5, 2-3, …, 215 and 𝛾 = 2-15, 2-13, …, 23 6.2.8.2 Training the System The videos that are used in the system are captured at the aquarium. The camera is placed on a tripod in order to keep it stationary. Since the tanks within the aquarium are so large, it is not easy to record a fish swimming at a constant distance from the camera all the time. Therefore, the only frames that are used in the system are those in which the fish appear at a reasonable distance from the camera and maintain this distance for at least three or four seconds. Since most fish appeared only once in the videos, at the desired distance, some of the training and testing videos are the same. This is acceptable since the duration of this project is only one year and though not impossible, capturing different training and testing videos is a complicated and tedious process. In order to have totally different training and testing sets, the tank should not be too large, because the shape and orientation of fish definitely changes as it is able to swim away from the camera. Each fish was trained with a total of about 40 training samples each. This includes both shape and colour data which is sent to the SVM. Each label in the SVM corresponds to a certain fish name, so when testing takes place, the SVM returns a value (the label) that is stored in a file and the system reads that label and prints the desired output to make the classification. 6.2.9 Figure 15: Send Shape and Colour Distribution to SVM System Classification If the user clicks on a particular fish, its features (shape and colour distribution) are sent to the SVM. Since the system has been trained prior to testing it, the SVM allows the system to know more or less what each fish’s features ‘look’ like. The SVM will respond by giving the system a label; this label corresponds to a certain fish species and the corresponding classification output is displayed to the user. Figure14 below illustrates the classification process. Figure 16: System Classification CHAPTER 7 TESTING AND RESULTS In order to correctly assess the accuracy of the system, each fish will be clicked on at least 10 times. This will amount to a total of 200 clicks, 10 clicks for 20 fish. The result of the test is represented in the graph below. The graph illustrates the accuracy of each individual fish, showing by what percentage each fish is classified correctly, which contributes to the overall accuracy of the system. The overall accuracy of the system amounts to 88%. This is a reasonable result taking into account the number of different types of fish that need to be classified. Percentage Accuracy of Individual Fish 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Fish Type Figure 17: Individual Accuracy In the above graph it is evident that an accuracy of 100% is achieved. This can be due to the fact that these particular fish have shapes that are very unique and shape features that are outstanding. There are also fish that have a low accuracy of 50% and less. This can be due to the fact that they take on shapes that are similar to other fish. Nevertheless, the systems performance is good overall. CHAPTER 8 USER MANUAL The demonstration mode of the GUI is illustrated in the figures that follow. 8.1 Starting the System A window will appear at start-up, verifying whether or not the user wants to start the system. The system will start if the user clicks ‘yes’ and will exit if the user clicks ‘no’. 8.2 Load Video The system is now requesting a video from the user. After a video is selected, the user will click on ‘Open’. The system will now use the selected video and display it on the screen. The user can now interact with the system by clicking on any fish within the video. CHAPTER 9 CODE DOCUMENTATION The code has been fully documented whereby comments were inserted at each statement and each method. A description of all inputs and outputs will be given as well as caveats of all methods. The final source code will be stored on a CD and placed in an envelope. Conclusion In Chapter 2 a detailed description of the problem is stated as well as the software solution to the problem. The user requires an easy-to-use, interactive system. Since the system includes a GUI, it is simple and easy to use, because the user navigates through the system by mouse clicks instead of typing commands. The system is therefore also interactive. The problem stated is that the visitors at the aquarium do not have instant access to information of specific fish. The final system clearly meets this requirement by providing an easy-to-use interactive system which provides the user with instant information about specific fish. Such a system is also educational and attracts people because it is interactive. Bibliography [1] Andrew Rova, G. M. (n.d.). Recognizing Fish in Underwater Video. [2] Bridget Benson, J. C. (n.d.). Field Programmable Gate Array (FPGA) Based Fish Detection Using Haar Classifiers. California San Diego, USA. [3] Chih-Wei Hsu, C.-C. C.-J. (2003). A Practical Guide to Support Vector Classification. Taipei, Taiwan. [4] Haslum, P. (n.d.). Computer Vision. [5] Kaehler, G. B. (2008). Learning OpenCV. USA. [6] Rapp, C. S. (1996). Image Processing and Image Enhancement. Johnson City, Texas, USA. [7] Zhou Hongbin, X. G. (n.d.). Real-time fish detection based on improved adaptive background. HangZhou ,Zhejiang Province, China.