Diego Mushfieldt
A mini-thesis submitted in partial fulfilment of the
requirements for the degree of
BSc. Honours
Supervisor: Mehrdad Ghaziasgar
Co-Supervisor: James Connan
Date: 2011-11-02
I would like to take this opportunity to thank my parents for
their constant motivation and inspiration and for granting me
the opportunity to study at the University of the Western Cape.
I would also like to thank those dear to my heart including my
family for always having the patience and understanding when I
could not always be in their presence. Last, but certainly not
least, I want to thank my supervisors, Mehrdad Ghaziasgar and
James Connan, for believing in me and giving me the confidence
I needed to keep on working so hard at this project. Your
dedication is truly an inspiration to me and to other students at
Aquariums currently display a large number of various kinds of
fish and visitors regularly desire to know more about a particular
kind of fish. Visitors can currently obtain this information by
asking either an expert or by scanning through the
documentation in and around the aquarium. However,
information may not be readily available at times. Therefore, it is
desirable to create a system that allows such information to be
readily available in an interactive manner. The project aims to
develop a system that uses a video stream of a wide range of fish
as its input. The user then clicks on a particular fish and the
system then classifies the fish and displays information about
the fish accordingly. The system aims to give users an enjoyable
and educational experience by allowing them to interact with the
system via the click of a mouse. This project is scalable enough
to incorporate functionality such as a touch screen in order to
improve interaction and enhance the user experience.
Table of Contents
CHAPTER 1 ........................................................................................................................................ 6
INTRODUCTION ............................................................................................................................. 6
1.1 Computer Vision and Image Processing ..................................................................................... 6
1.2 OpenCV (Open Computer Vision) ............................................................................................. 6
1.3 Current Research ........................................................................................................................ 7
CHAPTER 2 ........................................................................................................................................ 8
USER REQUIREMENTS ................................................................................................................... 8
2.1 User’s view of the problem ......................................................................................................... 8
2.2 Description of the problem ........................................................................................................ 8
2.3 Expectations from the software solution .................................................................................... 9
2.4 Not expected from the software solution .................................................................................... 9
CHAPTER 3 ...................................................................................................................................... 10
REQUIREMENTS ANALYSIS ........................................................................................................ 10
3.1 Designer’s Interpretation and Breakdown of the Problem ........................................................ 10
3.2 Complete Analysis of the Problem............................................................................................ 11
3.3 Current Solutions ..................................................................................................................... 12
3.4 Suggested Solution.................................................................................................................... 14
CHAPTER 4 ...................................................................................................................................... 15
USER INTERFACE SPECIFICATION (UIS) ................................................................................. 15
4.1 The complete user interface ...................................................................................................... 15
4.2 The input video frame .............................................................................................................. 16
4.3 How the user interface behaves ................................................................................................ 16
CHAPTER 5 ...................................................................................................................................... 17
HIGH LEVEL DESIGN (HLD)....................................................................................................... 17
5.1 Description of Concepts ........................................................................................................... 18
5.2 Relationships between objects .................................................................................................. 19
5.3 Subsystems of HLD ................................................................................................................. 19
5.4 Complete Subsystem ................................................................................................................ 20
CHAPTER 6 ...................................................................................................................................... 21
LOW LEVEL DESIGN (LLD) ......................................................................................................... 21
6.1 Low Level Description of Concepts ......................................................................................... 21
6.2 Detailed Methodology .............................................................................................................. 22
CHAPTER 7 ...................................................................................................................................... 32
TESTING AND RESULTS .............................................................................................................. 32
CHAPTER 8 ...................................................................................................................................... 34
USER MANUAL ............................................................................................................................... 34
8.1 Starting the System ................................................................................................................... 34
8.2 Load Video............................................................................................................................... 35
CHAPTER 9 ...................................................................................................................................... 36
CODE DOCUMENTATION .......................................................................................................... 36
1.1 Computer Vision and Image Processing
Computer vision [1] is the study of techniques that can be used to
make machines see. In this context ’see’ refers to a machine that
is able to extract information from an image that is necessary to
solve some task. The image can take many forms, such as video
sequences or even views from multiple cameras. Image
is basically any kind of signal processing whereby
the input for the processing is an image and the output is either
another image or a set of parameters related to the image.
1.2 OpenCV (Open Computer Vision)
is an open source computer vision library which is
written in C and C++. OpenCV runs on the following
platforms: Linux, Windows and MacOS. This library helps
people to build complicated vision applications with its simpleto-use vision infrastructure. Its ease-of-use can be experienced
when using its library which contains over 500 functions.
OpenCV also contains a machine learning library since
computer vision and machine learning go hand in hand, so it can
be used to solve any machine learning problem.
1.3 Current Research
Information about specific fish within an aquarium is not always
readily available at times. At the moment people can obtain
information either by scanning the documentation in the
aquarium or ask an expert. Therefore, it is desirable to develop a
system that provides instant information about specific fish in
an interactive manner. The proposed system identifies a fish,
using OpenCV’s libraries, by creating an image of the fish when
the user clicks on it with a mouse. The image is processed using
various algorithmic techniques and the necessary information is
then displayed on the screen.
The following section describes the problem from the user’s
point of view. It is critical to gather information from the user in
order to produce a meaningful solution.
2.1 User’s view of the problem
The user requires the system to provide an easy mechanism for a
user to select a fish (in this case via a click of a mouse) on a live
or pre-recorded video stream. The system should be able to
classify the fish, while it is in motion within the video stream,
when it is clicked on. The system should also be capable of
providing the user with information that is structured in a
sensible manner and which is easy to understand. It is very
important to consider how user friendly the system should be in
order to present the information with clarity so that the
provided information remains unambiguous.
2.2 Description of the problem
The main purpose of this project is to develop an interactive
system that is capable of providing instant feedback about a
particular fish which the user is interested in. The system should
be able to assist in educating its users about different fish species
by presenting certain facts about the particular type of fish that
the user is interested in.
2.3 Expectations from the software solution
The system is expected to classify one fish at a time when the
user clicks on it in the live video stream. The focus of this
project is not only on classifying the fish, but also on the kind of
information displayed to the user as well as the manner in which
information is displayed.
2.4 Not expected from the software solution
The system is not expected to display information about more
than one fish at the same time. Therefore, the system can only
process and perform analysis on one fish at a time. Since one
camera is used to capture the fish, the system can only process
the fish in two dimensions with only one camera angle.
Therefore the system is not expected to do its processing in
three dimensions.
The following section describes the problem from the designer’s
perspective and uses the previous chapter (CHAPTER 2) as a
starting point.
3.1 Designer’s Interpretation and Breakdown of the
The aquarium hosts many visitors each year and there are
various fish on display. However, the viewers are not always able
to obtain instant information of specific fish which they are
interested in learning more about. The input to the final system
is a live or pre-recorded video stream. Using a live video feed
rather than a recorded video file is ideal and more practical but
difficult to implement in terms of the efficiency of algorithms
and number of frames per second. A camera is used to
capture/record the fish while it is swimming. This allows the
user to observe the fish and decide which particular fish he/she
is interested in learning more about. The user interacts with the
system by moving the mouse cursor over a specific fish and by
clicking on it. The location of the click is used to determine
which fish was clicked on by the user and an image of that fish
is created. The system uses image processing techniques and
functions from the OpenCV (Open Computer Vision) libraries
in order to classify the fish accordingly. The system then
displays the necessary output on the screen. The difficulty lies in
processing the image to determine what fish was clicked on and
this should be done fast enough to ensure real-time processing.
3.2 Complete Analysis of the Problem
3.2.1 Recording the fish in real-time
A camera is used in order to record the fish swimming in the
fish tank. This input from the camera is used by the system in
order to display the live video feed of the fish in two dimensions
on a screen, and to enable the user to select a particular fish via
the click of a mouse.
3.2.2 Processing the image of the selected fish
After the user clicks on a specific fish, the location of the click is
used by the system in order to determine which fish was clicked
on. The region of interest (ROI) will be of importance in order
to achieve this goal.
3.2.3 Displaying information to the user
The system is not required to display critical biological features
of each fish. However, the information should be precise
enough in order to educate the user but concise enough to
ensure that the user does not feel overwhelmed by too much
information. Therefore, the system only displays the fish type.
3.3 Current Solutions
There are a few systems which are similar to this one. They are
used to do fish surveys as well as counting fish in order to
protect marine ecosystems. However, there is currently no
system that exists which solves the exact same problem as the
proposed system. The similar systems mentioned earlier do
make use of similar functions and techniques that are necessary
to solve the problem such as the Fish Identification System.
Some of these systems are described below.
3.3.1 Real-time fish detection based on improved adaptive
background [4]
This system is a kind of fish behaviour monitoring system. The
system proposes a new approach to update the background,
which is based on frame difference and background difference,
in order to detect the fish in a real-time video sequence. The
system combines the background difference and frame
difference to update the background more correctly and
completely using shorter computation times.
3.3.2 Recognizing Fish in Underwater video [5]
This system uses a deformable template object recognition
method for classifying fish species in an underwater video. The
method used can be a component of a system that automatically
identifies fish by species, improving upon previous works, which
only detect and track fish. In order to find globally optimal
correspondences between the template model and an unknown
image, Distance Transforms are used. Once the query images
have been transformed into estimated alignment with the
template, they are processed to extract texture properties.
3.3.3 Field Programmable Gate Array (FPGA) Based Fish
Detection Using Haar Classifiers [6]
The quantification of abundance, size and, distribution of fish is
critical to properly manage and protect marine ecosystems and
regulate marine fisheries. This system is designed to
automatically detect fish using a method based on the Viola and
Jones Haar-like feature object detection method on a field
programmable gate array (FPGA). This method generates Haar
classifiers for different fish species by making use of OpenCV’s
Haar Training Code which is based on the Viola-Jones detection
method. This code allows a user to generate a Haar classifier for
any object that is consistently textured and mostly rigid.
3.4 Suggested Solution
The system will work effectively at classifying various types of
fish. The suggested solution is easy to modify such that
additional functionality can be added to it when necessary. It is
also cost-effective since only one camera is used as well as opensource software (OpenCV).
The following section describes exactly what the user interface is
going to do, what it looks like and how the user interacts with
the program.
4.1 The complete user interface
The complete user interface is a Graphical User Interface (GUI).
Text commands are not used by the user to interact with the
system. The figure below shows the user interface as it appears
to the user.
Figure 1: User Interface
4.2 The input video frame
Once the system starts running, the video feed will be displayed
to the user within a window on the screen as shown above. The
user can now click on any fish within this window.
4.3 How the user interface behaves
The system will display the video feed once it is executed. It
then waits for input from the user via the click of a mouse. If
the user clicks on a fish, the system will respond by displaying an
additional window that shows the classification of the fish.
Figure 2: Behaviour of User Interface
In this section a HLD view of the problem will be applied. Since
the programming language of choice is C/C++; Object
Orientated Analysis is not being applied. A very high level of
abstraction of the system is constructed as we take a look and
analyse the methodology behind the construction of the system.
5.1 Description of Concepts
Consider the system objects and their corresponding
descriptions in the following table:
Ffmpeg is free and provides libraries for handling
multimedia data. It is a command line program
used for transcoding multimedia files.
OpenCV is a library of programming functions
mainly aimed at real time computer vision and is
focused mainly on real-time image processing.
This is the simplest method of image segmentation.
This technique is used to create a binary image in
Adaptive Threshold
which there exist only black and white pixels.
However, this method includes ways to adapt to
dynamic lighting conditions.
The ROI is set around the fish to segment it, since
Region of Interest (ROI)
it is only the fish that is of interest, not the entire
image. The coordinates of the user’s click is used to
set the ROI.
The edge pixels are assembled into contours. The
Contour Detection
largest contour is detected and is the only contour
that is used to represent the final shape of the fish.
The Histogram represents the distribution of
colour within an image.
A SVM is used to recognize patterns regarding the
intensity of the pixels and is used to classify which
Support Vector Machine (SVM)
class a certain pixel belongs to and makes its
decision based on the data analysis. The ROI as
well as the Histogram values are sent to the SVM
for training and testing the system.
Table 1: System Objects and their descriptions
5.2 Relationships between objects
The figure below depicts the relationships between the objects:
Figure 3: Object Relations
5.3 Subsystems of HLD
Figure 4: Subsystems
5.4 Complete Subsystem
The figure below shows the high level design and its subcomponents which include more detail about the subsystem.
Figure 5: Complete Subsystem
In this section explicit detail of all data types and functions will
be provided. Pseudo code will also be provided as well as all the
aspects of the programming effort without resorting to actual
6.1 Low Level Description of Concepts
cvCvtColor (bgrImg, hsvImg, CV_BGR2HSV)
cvAdaptiveThreshold (hsvImg, hsvThresh,
Adaptive Threshold
Region of Interest (ROI)
cvResize (hsvThresh, threshROI)
cvFindContours (threshROI, storage,
Contour Detection
&first_contour, sizeof (CvContour),
Draw Histogram
DrawHistogram (histImg)
Table 2: Low Level view
6.2 Detailed Methodology
This section will emphasise the methodology used to create this
system by analysing the detail of each component.
6.2.1 Video feed
The figure below depicts how the video feed is captured. The
consecutive frames make up the video which is displayed on the
user’s monitor. This is illustrated below in Figure6.
Figure 6: Video feed
6.2.2 Processing starts once the user clicks on the fish.
Figure 7: User clicks on fish
6.2.3 BGR (Blue, Green, Red) to HSV (Hue, Saturation,
Once a frame is captured from the video feed, it is converted
from RGB to HSV. This is done because the HSV colour space
is not as sensitive to dynamic lighting conditions as BGR.
Figure 8: Convert BGR to HSV
6.2.4 Adaptive Threshold
This method takes individual pixels and marks them as object
pixels if their value is greater than some threshold value and as
background pixels if they are less than some threshold value.
The resulting image is a binary image which consists only of
black and white pixels. The most important part is the selection
of the threshold value. In this system the threshold value is
manually selected using trial and error to observe which value
removes the most noise. The Hue component of the HSV
colour space is used and then an adaptive threshold is applied to
the single-channel image.
Figure 9: Adaptive Threshold
6.2.5 Region of Interest
This method uses x and y coordinates to set borders around the
object of interest (the fish). The larger image is then cropped in
order to do further segmentation on a smaller image, in which
only the fish is displayed.
Figure 10: Set ROI
6.2.6 Contour Detection and Flood Fill
This method computes contours it finds from a binary image. In
this case the binary image is the threshold image in which the
image edges are implicit as boundaries between positive and
negative regions. The largest contour, which is the shape of the
fish, is extracted to remove background noise and it is then filled
with white pixels to represent the shape of the fish.
Figure 11: Contour Detection and Flood Fill
6.2.7 Histogram
The Histogram values are used to represent the colour
distribution of the fish. The Figure below illustrates how
dominant ‘orange’ is within the RGB image.
Figure 12: Draw Histogram
6.2.8 Sending the Shape and Colour representations to the
Support Vector Machine (SVM) [7]
Since the shape and colour distribution of the fish is determined,
this data is combined and sent to the SVM. Each fish is given a
label (e.g. fish A has label 1 and fish B has label 2…, etc.) and
the corresponding features (shape and colour) are combined for
each label. The SVM then trains the system to recognize all the
fish that is clicked on, each fish having its own unique set of
features. SVM Cross-validation and Grid-search [7]
The RBF kernel nonlinearly maps samples into a higher
dimensional space so it, unlike the linear kernel, can handle the
case when the relation between class labels and attributes is
nonlinear. [4] The two parameters, C and γ, are the parameters
used in the RBF kernel. Some form of model selection needs to
be done in order to decide which C and γ are the best for a
given problem. The aim is to choose a good C and γ in order to
accurately predict testing data. In v-fold cross-validation, the
training set is divided into v subsets of equal size. One subset is
sequentially tested using the classifier which is trained on the
remaining v-1 subsets. Therefore, each instance of the whole
training set is predicted once so the cross-validation accuracy is
the percentage of data which are correctly classified. This kind
of cross-validation procedure can prevent the overfitting
problem. The figure below illustrates the overfitting problem
whereby the classifier overfits the training data.
In contrast, the classifier shown in Figure14 below, does not
overfit the training data and gives better cross-validation and as
well as testing accuracy.
Figure 13: Better Classifier
Figure 14: Overfitting problem
A “grid-search” is recommended on C and γ using crossvalidation. Different pairs of (C, γ) values are tried and only the
pair with the best accuracy is chosen. In order to identify good
parameters, exponentially growing sequences of C and γ are
tried; for example, C = 2-5, 2-3, …, 215 and 𝛾 = 2-15, 2-13, …, 23 Training the System
The videos that are used in the system are captured at the
aquarium. The camera is placed on a tripod in order to keep it
stationary. Since the tanks within the aquarium are so large, it is
not easy to record a fish swimming at a constant distance from
the camera all the time. Therefore, the only frames that are used
in the system are those in which the fish appear at a reasonable
distance from the camera and maintain this distance for at least
three or four seconds. Since most fish appeared only once in the
videos, at the desired distance, some of the training and testing
videos are the same. This is acceptable since the duration of this
project is only one year and though not impossible, capturing
different training and testing videos is a complicated and tedious
process. In order to have totally different training and testing
sets, the tank should not be too large, because the shape and
orientation of fish definitely changes as it is able to swim away
from the camera. Each fish was trained with a total of about 40
training samples each. This includes both shape and colour data
which is sent to the SVM. Each label in the SVM corresponds to
a certain fish name, so when testing takes place, the SVM
returns a value (the label) that is stored in a file and the system
reads that label and prints the desired output to make the
Figure 15: Send Shape and Colour Distribution to SVM
System Classification
If the user clicks on a particular fish, its features (shape and
colour distribution) are sent to the SVM. Since the system has
been trained prior to testing it, the SVM allows the system to
know more or less what each fish’s features ‘look’ like. The
SVM will respond by giving the system a label; this label
corresponds to a certain fish species and the corresponding
classification output is displayed to the user. Figure14 below
illustrates the classification process.
Figure 16: System Classification
In order to correctly assess the accuracy of the system, each fish
will be clicked on at least 10 times. This will amount to a total of
200 clicks, 10 clicks for 20 fish. The result of the test is
represented in the graph below. The graph illustrates the
accuracy of each individual fish, showing by what percentage
each fish is classified correctly, which contributes to the overall
accuracy of the system. The overall accuracy of the system
amounts to 88%. This is a reasonable result taking into account
the number of different types of fish that need to be classified.
Accuracy of Individual Fish
Fish Type
Figure 17: Individual Accuracy
In the above graph it is evident that an accuracy of 100% is
achieved. This can be due to the fact that these particular fish
have shapes that are very unique and shape features that are
outstanding. There are also fish that have a low accuracy of 50%
and less. This can be due to the fact that they take on shapes
that are similar to other fish. Nevertheless, the systems
performance is good overall.
The demonstration mode of the GUI is illustrated in the figures
that follow.
8.1 Starting the System
A window will appear at start-up, verifying whether or not the
user wants to start the system. The system will start if the user
clicks ‘yes’ and will exit if the user clicks ‘no’.
8.2 Load Video
The system is now requesting a video from the user. After a
video is selected, the user will click on ‘Open’. The system will
now use the selected video and display it on the screen. The user
can now interact with the system by clicking on any fish within
the video.
The code has been fully documented whereby comments were
inserted at each statement and each method. A description of all
inputs and outputs will be given as well as caveats of all
methods. The final source code will be stored on a CD and
placed in an envelope.
In Chapter 2 a detailed description of the problem is stated as
well as the software solution to the problem. The user requires
an easy-to-use, interactive system. Since the system includes a
GUI, it is simple and easy to use, because the user navigates
through the system by mouse clicks instead of typing
commands. The system is therefore also interactive. The
problem stated is that the visitors at the aquarium do not have
instant access to information of specific fish. The final system
clearly meets this requirement by providing an easy-to-use
interactive system which provides the user with instant
information about specific fish. Such a system is also educational
and attracts people because it is interactive.
