Uploaded by Abhilash Premkumar

ProjectReport (1)-converted

advertisement
A Project Report on
Sentimental Analysis Using Facial Recognition
Submitted in partial fulfillment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
in
Computer Science and Engineering
by
Abhilash Premkumar
1560411
BACHELOR OF TECHNOLOGY
in
Information Technology
by
Ashish Jonnalagadda
1560928
Sameer Kumar N
1560921
Under the Guidance of
Michael Moses T
Department of Computer Science and Engineering
Faculty of Engineering, CHRIST (Deemed to be University),
Kumbalagudu, Bengaluru - 560 074
April-2019
Faculty of Engineering
Department of Computer Science and Engineering
CERTIFICATE
This is to certify that Ashish Jonnalagadda (1560928) has successfully completed the
project work entitled “Sentimental Analysis Using Facial Recognition” in partial fulfillment for the award of Bachelor of Technology in Computer Science and Engineering during the year 2018-2019.
Michael Moses T
Assistant Professor
Dr Balachandran K
Dr Iven Jose
Head of the Department
Dean
ii
Faculty of Engineering
Department of Computer Science and Engineering
BONAFIDE CERTIFICATE
It is to certify that this project titled ”Sentimental Analysis Using Facial Recognition”
is the bonafide work of
Name
Ashish Jonnalagadda
Register Number
1560928
Examiners [Name and Signature]
Name of the Candidate :
1.
Register Number :
2.
Date of Examination :
iii
Acknowledgement
I would like to thank CHRIST (Deemed to be University) Vice Chancellor, Dr Rev. Fr.
Abraham V M, Pro Vice Chancellor, Dr Rev. Fr. Joseph CC, Director of Faculty of
Engineering, Dr Rev. Fr. Benny Thomas and the Dean Dr Iven Jose for their kind
patronage.
I would like to express my sincere gratitude and appreciation to the Head of the Department of Department of Computer Science and Engineering, Faculty of Engineering Dr
Balachandran K, for giving me this opportunity to take up this project.
I am also extremely grateful to my guide, Michael Moses T, who has supported and
helped to carry out the project. His constant monitoring and encouragement helped me
keep up to the project schedule.
iiii
Declaration
We, hereby declare that the Project titled “Sentimental Analysis Using Facial Recognition” is a record of original project work undertaken by us for the award of the degree
of Bachelor of Technology in Department of Computer Science and Engineering.
We have completed this study under the supervision of Michael Moses T, Computer
Science and Engineering .
We also declare that this project report has not been submitted for the award of any
degree, diploma, associate ship, fellowship or other title anywhere else. It has not been
sent for any publication or presentation purpose.
Place: Faculty of Engineering, CHRIST (Deemed to be University), Bengaluru
Date:
Name
Ashish Jonnalagadda
Register Number
1560928
Signature
ivi
Abstract
The applications of sentiment analysis are powerful and broad. Maximum amount of
researches happening these day are mainly concerned on inspecting sentiment of textual data, meanwhile very few researchers are working on sentiment analysis of image
data. Few researchers have considered handcraft image features, the others has utilized
Convolutional Neural Network (CNN) features. Emotional analysis basically refers to
organization of the data (images,videos,text,etc) which is used to understand the mindsets or feelings expressed in the form of different manners such as sad, happy, neutral,
disgust,etc. Every phase in this project explains the emotional analysis, its dimensions
and various methodologies which are used with emotional analysis. Sentiment Analysis is a standout amongst the most well known applications in processing of image data
and with the combination of machine learning algorithms, deep learning algorithms it
turns out to be progressively successful and is used in substantial number of business
to build their productivity and to get better information on a person’s natural state. The
first phase of our project includes detecting a single face and classifying the sentiment
for the same using Viola jones Algorithm and SVM (support vector machine) as a classifier to classify the emotion of the particular face in the image. Phase two of our work
involves detecting multiple faces present in the image or a frame of live feed. Since
Viola-Jones Algorithm only can be applied to single upright image, we had to switch to
convolutional neural network (CNN) in order to detect multiple faces in a single image.
Hence ,this work features most recent studies with respect to implementation of deep
learning techniques for compelling emotional analysis.
Keywords: Sentiment analysis, Convolutional Neural Network, Machine Learning, Deep
Learning, Support vector machine.
vi
Contents
CERTIFICATEi
BONAFIDE CERTIFICATEii
ACKNOWLEDGEMENTiii
DECLARATIONiv
ABSTRACTv
LIST OF FIGURESvii
GLOSSARYviii
1
INTRODUCTION1
1.1 Problem Formulation ..............................................................................4
1.2 Problem Identification .......................................................................... 5
1.3 Problem Statement And Objectives .........................................................6
1.4 Limitations ............................................................................................. 7
2
RESEARCH METHODOLOGY8
2.1 Digital image processing ....................................................................... 9
2.2 Phases in Facial Expression Recognition ............................................... 10
2.2.1 Face Detection........................................................................... 10
2.2.2 Feature Extraction ..................................................................... 11
2.2.3 Face Recognition........................................................................ 11
2.3 Theory of Digital Image Processing ......................................................12
2.3.1 Representation of Digital image ............................................... 12
Need for processing digital photograph ........................ 13
2.3.2 Steps in Digital image processing ............................................. 14
Image Acquisition: ......................................................... 14
Image Enhancement: ...................................................... 14
Image Compression: ..................................................... 15
Histogram Equalization: ............................................... 16
vii
3
LITERATURE SURVEY AND REVIEW17
3.1 Literature Collection And Segregation ...................................................18
3.2 Critical Review of Literature ................................................................. 19
4
ACTUAL WORK21
4.1 Methodology for the Study ..................................................................... 21
4.2 Experimental and Analytical Work Completed in the Project ............... 25
4.3 Modeling, Analysis And Design ............................................................ 25
4.3.1 Theoretical analysis of the classifiers ....................................... 25
Cascading Classifiers: .................................................. 26
CNN: ..............................................................................27
5
RESULTS, DISCUSSIONS AND CONCLUSIONS28
5.1 Results and Analysis .............................................................................. 29
5.2 Comparative Study................................................................................. 30
5.3 Discussions ............................................................................................ 31
5.4 Conclusions............................................................................................. 32
5.5 Scope for Future Work ................................................................................. 33
BIBLIOGRAPHY34
A Code37
MATLAB .............................................................................. 37
Typical makes use of MATLAB: .......................................37
Introduction ..................................................................... 37
viii
LIST OF FIGURES
1.1
1.2
Layers of CNN Courtesy: Google images ............................................. 2
Phases of Sentimental Analysis ...............................................................4
2.1
2.2
2.3
2.4
2.5
Three main phases of Face recognition................................................... 10
Digital Image ......................................................................................... 12
Digital image processing applications ...................................................13
Fundamental Steps in Digital Image Processing.................................... 14
Compression of digital image .................................................................. 15
4.1
4.2
4.3
4.4
Design ......................................................................................................22
The basic LBP operator .......................................................................... 23
Detection Model ................................................................................... 23
Emotion Model ...................................................................................... 24
5.1
5.2
5.3
Phase 1 Output. ....................................................................................... 29
Phase 2 Output. ....................................................................................... 30
Confusion matrix of the system ............................................................... 30
A.1
A.2
A.3
A.4
A.5
A.6
A.7
Phase 1 MATLAB code. ........................................................................38
Phase 1 MATLAB code. ........................................................................39
Phase 1 MATLAB code. ........................................................................40
Phase 1 Matlab code. ............................................................................. 41
Phase 1 Matlab code. ............................................................................. 42
Phase 2 Python code. ............................................................................. 42
Phase 2 Python code. ............................................................................. 43
GLOSSARY
.
Item
Description
AI
Artificial Intelligence
SVM Support Vector Machine
EA
Emotional Analysis
LBP
Local Binary Patterns
ML
Machine learning
CNN
Convolutional Neural networks
ix
Chapter 1
INTRODUCTION
Sentiments can be exhibited in the form of texts, videos or images. A lot of research
papers are available for text sentiment analysis, but still image sentiment analysis is
not much explored. With rapid growth in people to engage in social media to express
emotions, it has become one of the most critical area of research, hence from past few
years alot of research has been focused for the same to achieve optimum results. Various methods and algorithms have been proposed for image sentiment analysis which
is broadly classified into two which involves Machine Learning based algorithms and
Lexicon based algorithms. Machine Learning based algorithms includes Support Vector Machine (SVM), Neural Network, Na¨ıve Bayes, Bayesian Network, and Maximum
Entropy while Lexicon based algorithms includes statistical and semantic based techniques. Face detection plays a major role in recognising the faces in face based image
analysis and is one of the basic limitation in computer vision. The performances of
different face based applications, from face identification and verification to face clustering, tagging and recovery, depend on exact and effective face detection. Recently
they have been various researches in the area of face detection which are focused on
faces in uncontrolled setting, which are challenging due to the variations in subject
level (e.g., a face can have many different poses), category level (e.g., adult and baby)
and image level (e.g., illumination and cluttered background).
CNN is a feed-forward neural network which is significantly utilized for image processing, image classification or image prediction; So CNN is a standout amongst the most
imperative application in the analysis of Images or visuals. A sequence of functions is
performed for image Sentiment analysis using CNN. This comprises of convolutional
1
layer followed by nonlinear layer followed by Pooling Layer and fully connected layer
[2].
The primary layer of CNN image classification is convolutional layer. As the image is
taken as an input, assume the analysis or reading of an image starts from the top left
corner, then a small segment or matrix of the image is selected known as filters. There
would be multiple convolutional networks as the image will pass one convolutional
layer and the output of a layer would be the input for the next layer. The Nonlinear Layer
is the second function in the scheduling, after the convolution operation. It consists of
a function known as activation function which gives the CNN a nonlinearity behaviour.
Pooling Layer follows nonlinear layer reduces the workload by reducing the features
of an image or the image volume will be reduced if the given image is of large size.
For example, if any of the features are already identified in the model in the previous
convolution operation, then it would not be processed further for identification. This is
referred to as down sampling or sub-sampling. After the pooling layer the outcome is
as yet anticipated, and after that the idea of the fully connected layer is presented. It
takes the output data from the convolutional networks. It flattened the matrix into the
vector and then gives it into the fully connected layer. As mentioned, CNN consists of
one or more convolutional and sub-sampling layer followed by a layer which connects
altogether such as a standard neural network. A CNN architecture with 6 layers can be
seen in Figure 1.1 [4]
FIGURE 1.1: Layers of CNN
Courtesy: Google images
2
Convolutional Neural Network is a framework which is otherwise called class of deep,
feed-forward artificial neural networks that has decidedly been practical for various
investigations. CNN concedes contribution to the type of single words which requires
additional time and efforts. Morphological Sentence Pattern model distinguishes the
highlights and articulations of sentences and aides in shaping shorter patterns. A mix of
MSP model and CNN gives improved sentiment analysis.
The older face detection research can be seen as a history of more efficiently sampling
the output space to a solvable scale and more effectively evaluating per configuration.
One natural idea to achieve this is by using cascade, where classifier with low computation cost can be firstly used to shrink background while keeping the faces. The
pioneering work popularized this, which combines the classifiers in different stages, to
allow background regions quickly discard while spending more computation on promising face-like regions.
Haar cascade is used to extract features from the images in the system since cascading
can handle both negative and positive samples of data. The cascaded CNN for face detection in [6] contains three stages. In every phase, they utilize one detection network
and one alignment network. There are six CNNs in total. In practice, this makes the
training procedure pretty complex. We need to carefully set up the training samples
for every stage and improve the networks one by one. In the investigation of cascaded
CNN, joint training sets are utilized since there are diverse networks for both detection and alignment. Since cascading classifiers are used, the sub-windows which are
not dismissed by the initial classifier are handled by an arrangement of classifiers, each
marginally more complex than the last. In the event that any classifier rejects the subwindow, no further processing is performed. The reason for choosing the cascading
algorithm was because it was the faster in processing real-time images, also it rejects
false positive detection in the early stages [8]. The computation time required for processing the features is also low leaving it to vast usage in different applications using
face detection.
Since the system is aimed at assisting the administration and management in the medical
field by prioritizing the patients at any hospital, clinic or help center by exclusively
detecting the faces of the patients and extracting their sentiments (pain), the mentioned
methods, techniques and concepts are used in the development of the system. The
system will automate the process of assistance to the patients eradicating the usage of
token and queue system. Management and maintenance will also prove to be effective
and easy with the system in hand.
3
FIGURE 1.2: Phases of Sentimental Analysis
1.1
Problem Formulation
Human facial expressions can be subjected into 7 basic sentiments: happy, sad, surprise, fear, anger, disgust, and neutral. Facial emotions for human beings are expressed
through activation of specific changes in the facial muscles. These sometimes subtle,
yet complex, signals in an expression often contain a lot of information about our state
of mind. With the help of facial sentiment recognition, we are able to measure the effects that content and services have on the users through an simple and cost friendly
procedure. For instance, many retailers use these metrics for evaluating buyers interest.
Healthcare providers are able to provide better service by using additional information
4
about patients emotional state during the treatment. Entertainment producers can keep
track of audience engagement in events to consistently create a better content.
Human beings are usually well-trained at reading emotions. Infact in research conducted a 14 months old baby was able to detect whether a person was sad or happy.
But can computers do a better job than us in accessing emotional states? To answer
the question, A deep learning neural network was designed which gives machines the
ability to make inferences about our emotional states. In other words, we give them
eyes to see what we can see.
1.2
Problem Identification
Sentiment analysis is very valuable in social media checking as it enables us to pick up
an outline of the more extensive general opinion behind specific topics. The capacity
to extract bits of knowledge from social data is a training that is broader area which
is received by associations over the world. In any case, this isn’t to imply that that
sentiment analysis is an ideal science by any means. The human language is mind
boggling, training a machine to analyse the different syntactic errors, social varieties,
slang and incorrect spellings that happen in online notices is a troublesome procedure.
Training a machine to see how context can affect tone is considerably progressively
troublesome. Humans are genuinely natural with regards to deciphering the tone of a
piece of writing. It is understood that majority of the people having a postponed flight
is definitely not a good experience (except if there’s a free bar as reward included).
By applying this relevant understanding to the sentence, we can without much of a
stretch recognize the sentiment as negative. Without relevant understanding, a machine
looking at the sentence above might see ”brilliant” and arrange it as positive. Machine
learning techniques and the field of natural language processing both have their role to
carry out later on of sentiment analysis. There’s a great deal of work to be done, however
enhancements are being made each day. Similarly as with any automated process, it is
inclined to make mistakes and frequently needs a human eye to look out for it and
redefine sentiment in the way that they trust that it has been improperly arranged.
5
1.3
Problem Statement And Objectives
The accomplishment of administration mechanical technology unequivocally relies upon
a smooth framework to client cooperation. In this manner, a bot or a framework ought to
be ready to remove data just from the essence of its client, like distinguish the enthusiastic state or reason gender orientation. Deciphering effectively any of these components
utilizing AI (ML) strategies has ended up being entangled due the high inconstancy
of the examples inside each assignment . This prompts models with a great many parameters prepared under a large number of tests . Besides, the human exactness for
characterizing a picture of a face in one of 7 unique feelings is 65.
One can watch
the trouble of this undertaking by attempting to physically group the FER-2013 dataset
pictures inside the accompanying classes ” angry”, ”disgust”, ”fear”, ”happy”, ”sad”,
”surprise”, ”neutral”. Notwithstanding these challenges, bot stages arranged to visit and
understand family unit errands require outward appearances frameworks that are powerful and computationally effective. In addition, the best in class strategies in picture
related errands such as picture grouping and item identification are altogether founded
on Convolutional Neural Networks (CNNs).
These undertakings require CNN structures with a huge number of parameters; in this
way, Their organization in robot stages and continuous frameworks ends up unfeasible. we propose an actualize a general CNN building system for planning ongoing
CNNs. The usage have been approved in a constant outward appearance framework
that gives face-discovery, gender orientation grouping and that accomplishes humanlevel execution when grouping feelings. Besides, CNNs are utilized as secret elements
and frequently their learned highlights stay concealed, making it entangled to set up a
harmony between their characterization precision parameters. Therefore, we actualized
a realtime perception of the guided-inclination back-spread proposed by Springenberg
so as to approve the highlights learned by the CNN.
• To develop a facial expression recognition system.
• To experiment machine learning algorithm in computer vision fields.
• To detect emotion thus facilitating Intelligent Human-Computer Interaction.
6
1.4
Limitations
• Like all opinions, sentiment is characteristically different from individual to indi-
vidual, and can even be through and through irrational.
• It is important to mine a huge and relevant samples of information when attempt-
ing to quantify sentiment.
• A person’s sentiment toward a brand or item might be impacted by at least one or
more causes.
• Since sentiment in all respects likely changes after some time as per an individ-
ual’s state of mind, world occasions, etc, it is normally essential to take a look at
information from the standpoint of time.
• It is a unimaginably troublesome issue, sarcasm and different sorts of unexpected
language are naturally tricky for machines to distinguish when taken a look at in
separation.
7
Chapter 2
RESEARCH METHODOLOGY
Image processing commonly refers to virtual photo processing, however optical and
analog photograph processing are also possible. This article sets standard strategies that
they apply to all. The acquisition of pictures (producing the input image within the first
place) is referred to as imaging.
In this procedure, a photo is captured by using a digital camera to create a digital or
analog picture. In order to supply a physical picture, the photo is processed using the
proper era based at the input source type. In digital format, the photograph is saved as
in forms of jpg, png, bmp etc. The shades and textures are all captured on the time the
photograph is taken the software interprets this facts into an photo. The properties are
as follows:
• Color corrections which includes brightness and evaluation modifications, col-
oration mapping, coloration balancing, quantization, or shade translation to a
special color area.
• Image registration, the alignment of or extra images.
• Image differencing and morphing.
• Image popularity,in few cases may extract the text from the photo with the usage
of optical person popularity or checkbox and bubble values using optical mark
reputation.
• Image partition.
• High dynamic variety imaging with the aid of combining multiple images.
8
2.1
Digital image processing
Digital image processing is the usage of computer algorithms to carry out image processing on virtual photos. As a subcategory of virtual image processing, digital picture
processing has a lot of benefits over analog image processing. It allows a various wider
variety of algorithms to be carried out to the input records and may keep away from
problems inclusive of the build-up of noise and acclusions during processing. Since,
photos are defined over three dimensions virtual photograph processing can be modeled
within the shape of multidimensional systems.
The value of processing turned into fairly excessive, with the computing system to the
current generation. Within the early 1970s, when digital image processing proliferated
as cheaper computers and dedicated hardware have become available. Images then
are processed in real time, for some dedicated problems like television requirements
conversion. As realtime computers have become faster, they commenced to take over
the role of dedicated hardware which are for all specialised and computer-intensive
operations.
With the faster computer systems and signal processors available in the 2000s, virtual
image processing has come to be most common component of image processing and
generally, is used as it isn’the most effective ,versatile approach, however is also the
most inexpensive. Digital image processing permits the use of a lot greater complex
algorithms for picture processing, and for this reason, can offer each more sophisticated
performance at easy responsibilities, and the implementation of strategies which could
be not possible by using analog means.
Facial expression analysis has been attracting considerable attention in the advancement
of machine interface since it provides a natural and efficient way to communicate between machines[2]. Understanding the human seventh cranial nerve look and the study
of formulation has many aspects, from information processing system analysis, emotion
recognition, lie detectors, airport security, nonverbal communicating and even the role
of expressions in art. Some application area related to face and its expressions include
personal identification and access control, video recording telephone set and teleconferencing, forensic applications, human-computer interaction, automated surveillance,
cosmetology, and so on. But the carrying out of the face spying certainly affect the
performance of all the applications.
9
2.2
Phases in Facial Expression Recognition
Keeping in mind the end goal to break down facial expressions, there are three fundamental stages. These stages are Face Detection, Feature Tracking and Face Recognition.
The final product of each of the stages are connected to acquiring the correct solution.
FIGURE 2.1: Three main phases of Face recognition
From the Figure 2.1, we can see the three main phases of face recognition. A lot of
methods have been developed for representing face recognition. Even however outward
appearance framework is hard to combine as far as parts, These stages help to give a
wide diagram and perspective .Another imperative preferred standpoint of partitioning
this framework is we can devise any proficient approach to deal with issues happening
in each of the stages.
2.2.1
Face Detection
Face recognition is a computer technology being utilized in an assortment of applications that recognizes human faces in digital images. Face recognition additionally
refers to the mental procedure by which humans locate and attend to faces in a visual
scene. The destination of this project is to detect human faces in a digital image and
identify the sentiments of each face. Facial detection involves the separating of images
into two character; one containing the face and the other containing the background.
It is hard because although similarities exist between faces, they can vary considerably
in Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun term of age, skin
colouring material and facial expression.
Feature based techniques: The feature based approach shot use the facial lineament to
their detection operation.
Image based techniques: It consists of the various approaches like neural networks,
example based learning, support vector machine.
10
2.2.2
Feature Extraction
The objective was to design and implement a boldness sensing element in that which
will detect human faces in an image similar to the grooming figure of speech. To recognize the facial part in a picture in a scene we utilize a genuine face detection plots
which was proposed by Viola-Inigo Jones. The Viola-Jones face detector comprises of
a cascade of classifiers where every classifier comprises of fundamental image channel
which is basic for speeding of the detector.
Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, ”Rapid Object
Detection using a Boosted Cascade of Simple Features” in 2001[18]. It is a machine
learning based approach where a cascade function is trained from a lot of images with
faces. It is then used to detect objects in other images.
2.2.3
Face Recognition
Facial recognition is a biometric technique for recognizing a person by looking at live
catch or computerized image data with the stored information of that individual. Facial recognition frameworks are regularly utilized for security purposes however are
progressively being utilized in an assortment of different applications. It is normally
utilized in security frameworks and can be contrasted with different biometrics, for example, unique finger impression or eye iris recognition frameworks[7]. In recent times,
it has also become very popular commercial identification and marketing tool. Human
face recognition can be done passively without any explicit action mechanism or participation on the part of the user since likeliness can be obtained from a separation by a
camera. This is especially useful for safety efforts and observation expectation.
Facial identification frameworks or Facial recognition systems are normally utilized for
security purposes however are progressively being utilized as a part of an assortment of
different applications. Most recenent facial recognition frameworks work with numeric
codes called face prints. Currently, a great deal of facial recognition improvement is
centered around cell phone applications. Cell phone facial recognition limits incorporate picture labelling and other interpersonal interaction coordination purposes and also
customized marketing. Facial recognition programming likewise improves showcasing
personalization. Popular recognition algorithms include principal component analysis
using Eigen faces, linear discriminant analysis, elastic bunch graph matching using the
11
Fisher face algorithm, the hidden Markov model, the multi linear subspace learning using tensor representation, and the neuronal motivated dynamic link matching.
2.3
Theory of Digital Image Processing
A picture is represented technically as two dimensional characteristic f(x, y) which
represents the depth of selected pixel and here f denotes the intensity and x,y terms is
termed as sparsity of pixel or weight of the pixel which offers the exact place of pixel
in an virtual image. Literally the virtual image is likewise termed as “a picture is not an
photo without any item in it”.
FIGURE 2.2: Digital Image
2.3.1
Representation of Digital image
Generally a virtual image is represented in pixels that are taken into consideration as
minute elements of an photograph or also termed as photographs. A pixel is a combination of 8 bits composed of both most significant bit and as well as least significant bit.
Here an exciting point is that most significant bits (MSB’s) have the resistive behaviour
and least significant bit (LSB’s) have the reputation behaviour. Whenever an photo is
at risk of noise or some other variation in brightness, contrast, resolution then it will its
impact specially on least sizeable bits because of its acceptance behaviour. These bits in
12
a pixel are arranged in the cascading behaviour where all eight bits intensity is proven
in most significant bit of all pixels. Digital photo picture intensities are relies upon the
arrangement of these bits in a right manner with a view to visualize in a proper way to
human visual system (HVS).
Need for processing digital photograph Digital image processing performs a prominent position many utility oriented fields as navy, biometric, robotics, genetics, radar
image processing, satellite tv for pc photograph processing and medical photograph
processing and so on. Whenever a photo tends to trade its behaviour from the regular
form to strange form then it suggests that the variation inside the brightness, version
within the contrast tiers, variation inside the decision and so on and in any other state
of affairs it may tend to change its behaviour because of environmental disturbances
termed as Gaussian noise and Human made errors along with applying the incorrect
algorithm, hand jitter and particularly clinical subject is an utility in which the processing plays a vital role to apprehend the patient condition by using the respective doctor
however this take place after we’ve the records in a proper manner if processing isn’t
performed then so many medical associated applications are fails.
FIGURE 2.3: Digital image processing applications
13
FIGURE 2.4: Fundamental Steps in Digital Image Processing
2.3.2
Steps in Digital image processing
Image Acquisition: Acquiring the picture is achieved by means of using the sensors, radars, satellites, cameras and so on. Although acquisition appears as a simple
technique when it comes to logical manner it’s a difficult task. Mainly it includes two
important steps compression of photograph and enhancement of picture. Whenever
we’re acquiring an object with the aid of the usage of any digital sensory tool it first
compress the specific item by using zero percent then it enhance the respective object in
step with the decision of the tool for better view of the photo with the aid of the human
visible machine.
Image Enhancement: Generally due to the non-stop variation in the lighting fixtures conditions and variations within the other elements we gather the low pleasant
snap shots. Due to the adjustments inside the real time lighting conditions in place of
excessive excellent pics we gather the low satisfactory pics, in addition to decorate the
excellent of low first-rate pics we have to improve the several parameters and factors
that are associated with the photo on the way to yield the high end quality photos in
area of low quality photos. The elements related to virtual image to enhance the quality
are contrast, brightness stages, lowering the noise effect on photo etc. Naturally a query
arises why we need to convert the low quality virtual photo to the high quality picture
or why we ought to enhance the best of photo. In order to enhance the low quality
virtual, we need to opt for the better enhancement techniques that already exist. In the
14
category of enhancement techniques most successful and extraordinarily used enhancement method is comparison enhancement method. The most important issue considered
while enhancing the low quality pics is the important technique which have to adapt to
the respective relative displays specifically the comparison enhancement method. In
literature, a lot of frameworks and algorithms are proposed but most of the algorithms
are based on the enhancement.
Image Compression: Digital picture Compression performs a distinguished function in lots of image processing applications. But compression of an image depends
upon many vital elements such as photo electricity which mainly relies upon the brightness, evaluation stages, Especially in packages like steganography and watermarking in
which the statistics is embedded or hidden on the photograph. After successfully embedding the statistics before transmission the respective image will be compressed for
safety issues.
FIGURE 2.5: Compression of digital image
Digital image may be represented in two distinctive approaches, in first technique in
which you possibly can view the content material of photo like object but we cannot see
the pixels and its values and it’s miles called the digital picture. In second technique
we are able to view the pixels and its data but we cannot view the content like object
in it and it’s miles called as the histogram. The major advantage of the digital photo
histogram is that by viewing the data of histogram it is easy to get clear estimation of
15
the feature estimation of the respective virtual photo. In Histogram Equalization (HE)
method, one can equalize the all the classical pixel intensities in which we employ
the normalization approach. But while assessing the power consumption the usage of
the Histogram Equalization technique is mainly laid low with the pixel intensities in
preference to historical past mild intensities.So one more advantageous strengths the
consumption model which is carried out based on the Histogram Equalization term and
index term.
Histogram Equalization: Histogram has many different aspects like histogram rotation, histogram distribution, histogram transferring, and histogram equalization. Histogram equalization performs crucial position in lots of virtual processing packages and
furthermore while the crucial position in an digital photograph is on the close contrast
values by means of the use of histogram equalization technique we can increase the tiers
of global comparison of many distinctive photographs which has the important records
at the close evaluation values. By using the histogram equalization intensities of pixels
related to digital photograph it can be allotted in higher way to visualise better by human visible machine. The major advantage of the digital picture histogram equalization
is that it equalizes all values of pixels so that the pixels with low intensities can get the
better visible appearance this is accomplished by way of spreading the higher values to
the low pixel values by using the usage of the histogram equalization method.
16
Chapter 3
LITERATURE SURVEY AND
REVIEW
Most regularly used CNNs for feature extraction include a set of fully connected layers
at the end. Fully connected layers, generally do not contain most of the parameters in a
CNN. Especially, VGG16 contains approximately 90% of all its parameters in their last
fully connected layers[10]. Recently architectures such as Inception V3 have reduced
the amount of parameters in their last layers by including a Global Average Pooling
operation[12]. Global Average Pooling decreases each feature map into a scalar value
by taking the average over all elements in the feature map. The average operation forces
the network to extract global features from the input image.
A few issues, for example, precise analysis, minimal effort modeling, low-multifaceted
nature plan, consistent transmission, and adequate stockpiling, ought to be tended to
while building up a total medicinal services structure. In this paper, we propose a patient state acknowledgment framework for the medicinal services system. We structure
the framework so that it gives great acknowledgment accuracy, giving ease in modeling,
and is versatile. Discourse and video input are handled independently amid highlight
extraction and modeling; these two information modalities are converged at score level,
where the scores are acquired from the models of various patient states. The automatically recognized state of a patient will enable medical caregivers to take appropriate
actions promptly.
The use of social networks has grown exponentially in recent years. The large amount
of data available in these networks can be effectively utilized in many machine learning
applications. For the feature extraction, an interlaced derivative pattern is used, while
17
for a base classifier, an extreme learning machine is utilized. Once the emotion is recognized in the cloud, it can be shared with the end users to meet their interests[21].
Several experiments were performed using some publicly available databases and heterogeneous images from the social networks. Experimental results showed that the
proposed framework may effectively be used in the emotion recognition.
3.1
Literature Collection And Segregation
Our capacity to separate between basic facial expressions of feeling creates among early
stages and early adulthood, yet few investigations have investigated the formative direction of feeling acknowledgment utilizing a solitary technique over a wide age-run[15].
We examined the improvement of feeling acknowledgment capacities through youth
and puberty, testing the theory that youngsters’ capacity to perceive straightforward
feelings is regulated by ordered age, pubertal stage and sexual orientation. So as to
build up standards, we surveyed 478 youngsters matured 6– 16 years, utilizing the
Ekman-Friesen Pictures of Facial Affect. We at that point modeled these cross-sectional
information regarding ability in exact acknowledgment of the six feelings considered,
when the positive connection between’s feeling acknowledgment and IQ was controlled.
Noteworthy direct patterns were found in kids’ capacity to perceive facial expressions
of joy, amazement, dread, and disturb; there was improvement with expanding age. Interestingly, for dismal and irate expressions there is practically zero change in accuracy
over the age run 6– 16 years; close grown-up dimensions of capability are set up by
center adolescence. In an examined subset, pubertal status impacted the capacity to
perceive facial expressions of sicken and outrage; there was an expansion in skill from
mid to late pubescence, which happened freely of age. A little female favorable position was found in the acknowledgment of some facial expressions. The regularizing
information gave in this examination will help clinicians and analysts in evaluating the
feeling acknowledgment capacities of kids and will encourage the recognizable proof
of variations from the norm in an ability that is frequently debilitated in neurodevelopment issue. On the off chance that feeling acknowledgment capacities are a decent
model with which to comprehend pre-adult advancement, at that point these outcomes
could have suggestions for the training, emotional wellness arrangement and legitimate
treatment of youngsters.
18
The neuropeptide oxytocin has as of late been appeared to upgrade eye stare and feeling acknowledgment in sound men[16]. Here, we report a randomized twofold visually
impaired, fake treatment controlled preliminary that inspected the neural and social impacts of a solitary portion of intranasal oxytocin on feeling acknowledgment in people
with Asperser disorder (AS), a clinical condition portrayed by hindered eye stare and
facial feeling acknowledgment. Utilizing useful attractive reverberation imaging, we inspected whether oxytocin would improve feeling acknowledgment from facial segments
of the eye versus the mouth locale and balance local movement in cerebrum territories
related with face recognition in the two grown-ups with AS, and a neurotypical control
gathering. Intranasal organization of the neuropeptide oxytocin improved execution in
a facial feeling acknowledgment assignment in people with AS. This was connected to
expanded left amygdale reactivity in light of facial upgrades and expanded action in the
neural system engaged with social perception. Our information propose that the amygdale, together with practically related cortical zones intervene the constructive outcome
of oxytocin on social subjective working in AS.
3.2
Critical Review of Literature
Usually utilized CNNs for feature extraction incorporate a set of completely associated
layers toward the end. Completely associated layers will in general contain the majority
of the parameters in a CNN[18]. In particular, VGG16 contains roughly 90% of all its
parameters in their last completely associated layers. Later architectures, for example,
Inception V3, decreased the sum of parameters in their last layers by including a Global
Average Pooling activity. Worldwide Average Pooling decreases each feature map into
a scalar incentive by taking the average over all components in the feature map. The
average activity powers the system to separate worldwide features from the information
picture. Current CNN architectures, for example, Xception influence from the mix of
two of the best exploratory suppositions in CNNs: the utilization of leftover modules
and profundity astute divisible convolutions. Profundity astute distinct convolutions
diminish further the measure of parameters by isolating the procedures of feature extraction and mix inside a convolutional layer. Moreover, the cutting edge show for the
FER2-2013 dataset depends on CNN prepared with square pivoted misfortune. This
model accomplished a precision of 71% utilizing around 5 million parameters. In this
architecture 98% of all parameters are situated in the last completely associated layers.
19
The second-best strategies displayed accomplished a precision of 66% utilizing an outfit
of CNNs.
20
Chapter 4
ACTUAL WORK
In the proposed system, there are two models which we assessed in agreement to their
test accuracy and number of parameters. The two models were planned with making the
best accuracy over number of parameters proportion. Our first model involves extracting sentiments or emotions from a single face in a given image. The second is designed
to detect multiple faces in a given image and classifying their emotions.
Following the past design blueprints, our underlying architecture utilized Global Average Pooling to totally expel any completely associated layers. This was accomplished
by having in the last convolutional layer indistinguishable number of highlight maps
from number of classes, and applying a softmax initiation capacity to each diminished
component map. In this model, FER-2013 dataset was incorporated. This dataset contains 48,887 grayscale pictures where each picture has a place with one of the accompanying classes ”angry”, ”disgust”, ”fear”, ”happy”, ”sad”, ”surprise”, ”neutral”. Our
underlying model accomplished an accuracy of 66% in this dataset. We will allude to
this model as ”consecutive completely CNN”.
4.1
Methodology for the Study
Local binary patterns are feature vectors separated from a gray-scale picture by applying a local texture operator at all pixels and afterward utilizing the aftereffect of the
operators to frame histograms that are the feature vectors. The first LBP operator is
developed as follows: Given a 3x3 neighborhood of pixels as appeared in Figure 4.1, a
binary operator is made for the area by contrasting the center pixel with its neighbors
in a fixed manner, from the left-center pixel in counter-clockwise manner. On the off
21
FIGURE 4.1: Design
chance that a neighbor has a lower power than the middle pixel it is relegated a zero,
otherwise a one. This will generate an 8-bit binary number, whose decimal valued entry
in a 256 bin histogram is increased by one. The complete LBP-histogram of an image will then depict the frequency of each individual binary pattern in the image. Due
to its design, the feature vectors are robust to monotonic intensity variations since the
LBP-operator is not affected by the size of the intensity difference. The feature vectors
22
are not affected by small translations of the face either since the same patterns will be
accumulated in the histogram regardless of their positions.
FIGURE 4.2: The basic LBP operator
FIGURE 4.3: Detection Model
Local Binary Pattern Local Binary Pattern (LBP) is one of the most popular methods for
face recognition. It is used in pattern analysis originally, but it is also used in the face
recognition area. It can encode one pixel in a gray-value image into a meaningful label
by using the gray value as a threshold to analysis the relationship with its neighbor. And
the authors divide the input image into several regions, then, they can calculate a local
histogram of the labels for each region, and combine the local histogram into a huge
23
FIGURE 4.4: Emotion Model
special histogram. When compare the similarity of any two images, they need only
calculate the similarity of the special histogram using weighted chi-square distance.
The executing time of this algorithm is very short, and its accuracy in AR datasets can
be over 95&. They focus on the problems under difficult lighting condition. Although
the Local Binary Patterns are robust for monotonic change of illumination, the lighting
focus on some part will affect the performance. X. Tan, et al. develop a general form of
LBP, called Local Ternary Pattern (LTP), which will be less sensitive to noise.
24
4.2
Experimental and Analytical Work Completed in
the Project
The various aspects of different types of face detection algorithms have been analysed
theoretically and the working is documented. The Viola-Jones algorithm is a widely
used mechanism for object detection. The main property of this algorithm is that training is slow, but detection is fast. This algorithm uses Haar basis feature filters, so it
does not use multiplications.The efficiency of the Viola-Jones algorithm can be significantly increased by first generating the integral image. The database used for both the
algorithms contain images of objects classified into pre-defined classes. The classifiers
after their respective calculations, match their proposed class to the actual class from
the database.
The Convolutional Neural Network works is a Deep Learning approach which has the
working principle similar to the human brain. . There are three basic layers which inturn can have many other layers in between as per the modification and the problem. In
our project we have implemented both these classifiers and have seen which algorithm
has more efficiency.
4.3
Modeling, Analysis And Design
4.3.1
Theoretical analysis of the classifiers
Viola -Jones Algorithm: For the usage of Facial expression detection, it is important
to recognize the facial figure first. Viola Jones calculation is a standout amongst the
most prevalent methods utilized for facial identification and analysis. It is broadly well
known since it is vigorous, Real time and has a high discovery rate. The viola jones calculation can be fundamentally executed in four stages. They are Haar Feature Selection,
Creating an Integral image, Adaboost Training and Cascading classifiers. Every single
human face share some comparable properties. These regularities might be coordinated
utilizing Haar Features. A few basic properties for human countenances
• The eye area is darker than the upper-cheeks.
• The nose connect area is brighter than the eyes.
25
• Piece of properties framing match able facial elements
• Area and size: eyes, mouth, extension of nose.
• Three sorts: two-, three-, four-rectangles, Viola and Jones utilized two-rectangle
highlights.
• For instance: the distinction in brilliance between the white and black rectangles
over a particular range.
The efficiency of the Viola-Jones algorithm can be significantly increased by first generating the integral image.
y
H(y, x) = ∑ ∗
x
∑ Y (p, q)
(4.1)
p=0 q=0
Each face recognition filter (from the set of N filters) contains a set of cascade-connected
classifiers. Each classifier looks at a rectangular subset of the detection window and
determines if it looks like a face. If it does, the next classifier is applied. If all classifiers
give a positive answer, the filter gives a positive answer and the face is recognized.
Otherwise the next filter in the set of N filters is run.
A picture portrayal called the Integral image assesses rectangular components in steady
time, which gives them an impressive speed advantage over more advanced option highlights. Since each component’s rectangular range is constantly contiguous no less than
one other rectangle, it takes after that any two-rectangle highlight can be figured in six
cluster references, any three-rectangle include in eight, and any four-rectangle include
in nine. The Integral picture at area (x,y), is the entirety of the pixels above and to one
side of (x,y), comprehensive. Adaboost Training is a kind of learning calculation that
can be utilized to enhance the face identification execution in the Viola Jones Algorithm. Although Adaboost Training can be utilized as execution enhancer it depends on
an arrangement of parameters, for example, Data set, Training time and so on
Cascading Classifiers: Cascading in general means grouping. Cascading classifiers
can be used for grouping similar features in a Facial Figure. In cascading, each stage
comprises of a solid classifier. So all of the features are gathered into a few phases
where each stage has certain number of features. The activity of each stage is to decide
26
if a given sub-window is unquestionably not a face or might be a face. A given subwindow is immediately disposed of as not a face in the event that it flops in any of the
stages.
A straightforward system for cascade preparing is given beneath: User selects values for
f, the maximum acceptable false positive rate per layer and d, the minimum acceptable
detection rate per layer. User selects target overall false positive rate F target.
P = Set of Positive examples
N = Set of Negative examples
CNN:
CNN Convolution neural system calculation is a multilayer observation that
is the uncommon plan for recognizable proof of two-dimensional picture data. Continuously have more layers: input layer, convolution layer and straightforward layer and
yield layer. In a profound system design, the convolution layer and test layer can have
various sub-layers. CNN is not as limited as the Boltzmann machine should be when
the 9 layer of neurons in the neighbouring layer for all associations, convolution neural
calculation, every neuron don’t have to associate with all aspects pictures but simply
need to associate with the neighbourhood of the picture. Also every neuron parameter
is set to the equivalent, specifically, the sharing of weight, to be specific every neuron
with a similar convolution parts to deconvolution picture. Normally CNN structures for
the most part comprise of the information layer, the convolutional layer, pooling layer,
full associate layer and yield layer. CNN calculation has two principle forms: Convolution and Sampling. Convolution process utilizes a trainable channel Fx, deconvolution
of the info picture (the main stage is the information picture, the contribution of the
after convolution is the element picture of each layer, to be specific Feature Map), at
that point include an inclination.
The key innovation of CNN is the local open field, sharing of loads, sub-examining by
time or space, for extricating include and lessen the extent of the preparation parameters. The benefit of CNN calculation is that to maintain a strategic distance from the
express component extraction, and certainly to gain from the preparation information;
a similar neuron loads on the outside of the element mapping, along these lines system
could gain knowledge about parallels , decrease multifaceted nature of system; Adapting sub inspecting structure by time or space, can accomplish some level of strength,
scale and mishapenning removal; Input data and system topology can be a decent match
having extraordinary preferences in discourse acknowledgment with picture handling.
27
Chapter 5
RESULTS, DISCUSSIONS AND
CONCLUSIONS
Our total genuine time pipeline including: face detection and emotion analysis have
been completely incorporated in our work. There can be a few regular misclassifications which can be seen, for instance, anticipating ”sad” rather than ”fear” and foreseeing ”angry” rather ”disgust”. An examination of the scholarly highlights between
a few emotions and both of our proposed models can be seen. It can be seen that the
CNN learned to get actuated by considering highlights like the chin, the teeth, the eyebrows and the augmenting of one’s eyes, and that each component stays steady inside
a similar class. These results console that the CNN figured out how to translate get it
capable human-like highlights, that give generalizable components. These interpretable
outcomes have helped us comprehend a few basic misclassification like people with
glasses being delegated ”angry”. This occurs since the name ”angry” is exceedingly
enacted when it trusts an individual is glaring furthermore, scowling highlights get mistook for darker glass outlines.
28
5.1
Results and Analysis
FIGURE 5.1: Phase 1 Output.
29
FIGURE 5.2: Phase 2 Output.
5.2
Comparative Study
FIGURE 5.3: Confusion matrix of the system
30
5.3
Discussions
Normally utilized CNNs for feature extraction incorporate a set of completely associated layers toward the end. Completely associated layers will in general contain the vast
majority of the parameters in a CNN. In particular, VGG16 contains around 90% of all
its parameters in their last completely associated layers[10]. Recent architectures, such
as Inception V3, diminished the amount of parameters in their last layers by including a
Global Normal Pooling operation[12]. Global Average Pooling diminishes each feature
map into a scalar value by taking the average over all components in the feature map.
The average activity powers the system to extract global features from the input image.
Present day CNN architectures such as Xception leverage from the combination of two
of the most successful experimental assumptions in CNNs: the use of residual modules
and depth-wise separable convolutions[2]. Depth-wise separable convolutions lessen
further the measure of parameters by isolating the procedures of feature extraction and
combination within a convolutional layer. Besides, the best in class display for the
FER2-2013 dataset is based on CNN trained with square hinged loss[13]. This model
accomplished an accuracy of 71% utilizing around 5 million parameters[4]. In this architecture 98% of all parameters are situated in the last fully connected layers. The
second-best techniques introduced in [4] accomplished a precision of 66% utilizing a
troupe of CNNs.
31
5.4
Conclusions
A system was proposed and tested for creating real-time CNNs. The proposed architectures have been systematically built in order to reduce the amount of parameters. Generally, it is done by completely removing the fully connected layers and by cutting down
the amount of parameters in the remaining layers via depth-wise separable convolutions.
It was seen that our proposed models can be stacked for multi-class classifications while
maintaining real-time inferences. Specifically, a developed vision system that performs
face detection and emotion classification in a single integrated module.The system has
achieved human-level performance in our classifications tasks using a single CNN that
leverages modern architecture constructs. This architecture reduces the amount of parameters while obtaining favorable results. Finally it presented a visualization of the
learned features in the CNN using the guided back-propagation visualization. This visualization technique is able to show the high-level features learned by the model and
discuss their interpretability.
32
5.5
Scope for Future Work
AI models are one-sided in agreement to their preparing information. In our particular
application we have exactly discovered that our prepared CNNs for sex arrangement are
one-sided towards western facial highlights and facial extras. Besides, as talked about
already, the utilization of glasses might influence the feeling grouping by meddling
with the highlights learned. Be that as it may, the utilization of glasses can likewise
meddle with the gender characterization. This may be an outcome from the preparation
information having the majority of the pictures of people wearing glasses alloted with
the mark ”man”. It has been accepted that revealing such practices is of outrageous
significance while making hearty classifiers, and that the utilization of the representation
strategies liked guided back-proliferation will become significant when revealing model
inclinations.
33
Bibliography
[1]G. Muhammad, ”Automatic speech recognition using interlaced derivative pattern
for cloud based healthcare system,” Cluster Computing, vol. 18, pp. 795-802, 2015.
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition. In Proceedings of the IEEE conference on computer vision
and pattern recognition,.
[3] Yichuan Tang. Deep learning using linear support vector machines., arXiv preprint
arXiv:1306.0239, 2013.
[4] Dario Amodei et al. Deep speech 2: End-to-end speech recognition in english and
mandarin. CoRR, abs/1512.02595, 2015..
[5] Ian Goodfellow et al. Challenges in Representation Learning: A report on three
machine learning contests, 2013.
[6]Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural
networks. In Proceedings of the Fourteenth International Conference on Artificial
Intelligence and Statistics, pages 315–323, 2011.
[7]M. S. Hossain, “A patient’s state recognition system for healthcare using speech
and facial expression,” Journal of Medical Systems (Springer), vol. 40, no. 12, pp.
272:1-272:8, December 2016.
[8]S. Kumar and S. K. Singh, ”Monitoring of pet animal in smart cities using animal
biometrics,”FutureGenerationComputerSystems,2016.
[9]M. S. Hossain, “Cloud-supported Cyber-Physical Framework for Patients Monitoring,” IEEE Systems J., vol. 11, no. 1, March 2017.
[10]L. Hu, et al., “Software Defined Healthcare Networks,” IEEE Wireless Communication magazine, vol. 22, no. 6, pp.67-75, Dec. 2015.
34
[11]M. S. Hossain, M. A. Rahman, and G. Muhammad, “Cyber Physical CloudOriented Multi-Sensory Smart Home Framework for Elderly People: Energy Efficiency Perspective,” Journal of Parallel and Distributed Computing, vol. 103, no.
2017, pp. 11-21, May 2017
[12]M. S. Hossain, and G. Muhammad, “Healthcare Big Data Voice Pathology Assessment Framework,” IEEE Access, vol. 4, no. 1, pp. 7806-7815, December 2016.
[13]G. Muhammad and M. F. Alhamid, “User Emotion Recognition from a Larger
Pool of Social Network Data Using Active Learning,” Multimedia Tools and Applications, 2016.
[14]K. Lawrence, C. Ruth, and D. Skuse. “Age, Gender, and Puberty Influence the
Development of Facial Emotion Recognition.” Frontiers in Psychology 6 (2015):
761.
[15]G. Domes, E. Kumbier, M. Heinrichs, and S. C. Herpertz, ”Oxytocin Promotes Facial Emotion Recognition and Amygdala Reactivity in Adults with Asperger Syndrome,” Neuropsychopharmacology, vol. 39, pp. 698–706, 2014.
[16]M. Shamim Hossain and G. Muhammad, ”Audio-Visual Emotion Recognition using Multi-Directional Regression and Ridgelet Transform,” Journal on Multimodal
User Interfaces, vol. 10, no. 4, pp. 325-333, 2016.
[17]C. K. Yogesh, M. Hariharan, R. Ngadiran, A. H. Adom, S. Yaacob, C. Berkai, and
K. Polat, “A new hybrid PSO assisted biogeography-based optimization for emotion
and stress recognition from speech signal,” Expert Systems with Applications, vol.
69, no. 1, pp. 149-158, March 2017.
[18]J. B. Alonso, J. Cabrera, M. Medina, and C.M. Travieso, “New approach in quantification of emotional intensity from the speech signal: Emotional temperature,”
Expert Systems with Applications, vol. 42, pp. 9554–9564, 2015.
[19]H. Cao, R. Verma, and A. Nenkova, “Speaker-sensitive emotion recognition via
ranking: Studies on acted and spontaneous speech,” Computer Speech and Language, 28(1), pp. 186–202, 2015.
[20]A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier and B. Schuller, ”Deep
neural networks for acoustic emotion recognition: Raising the benchmarks,”
2011 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), Prague, 2011, pp. 5688-5691.
35
[21]K. Wang, N. An, B. N. Li, Y. Zhang and L. Li, ”Speech Emotion Recognition
Using Fourier Parameters,” IEEE Transactions on Affective Computing, vol. 6, no.
1, pp. 69-75, Jan.-March 2015.
[22]Y. Sun, G. Wen, and J. Wang, “Weighted spectral features based on local Hu moments for speech emotion recognition,” Biomedical Signal Processing and Control,
18, pp. 80–90, 2015.
[23]H. Muthusamy, K. Polat, and S. Yaacob, “Improved Emotion Recognition Using
Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal
Signals,” Mathematical Problems in Engineering, vol. 2015, Article ID 394083, 13
pages, 2015. [25] Q. Mao, M. Dong, Z. Huang and Y. Zhan, ”Learning Salient
Features for Speech Emotion Recognition Using Convolutional Neural Networks,”
IEEE Transactions on Multimedia, vol. 16, no. 8, pp. 2203-2213, Dec. 2014
[24]I. Shahin and M.N. Ba-Hutair, “Talking condition recognition in stressful and
emotional talking environments based on CSPHMM2s,” International Journal of
Speech Technology, vol. 18, no. 1, pp. 77-90, March 2015.
[25]S. Deb and S. Dandapat, ”A novel breathiness feature for analysis and classification of speech under stress,” 21st National Conference on Communications (NCC),
Mumbai, 2015, pp. 1-5.
[26]A. Dehghan, E.G. Ortiz, G. Shu, and S.Z. Masood, “DAGER: Deep
Age, Gender and Emotion Recognition Using Convolutional Neural Network,”
arXiv:1702.04280, 2017.
[27]R. Jiang, T.S. Anthony T.S. Ho, I. Cheheb, N. Al-Maadeed, S. AlMaadeed, and
A. Bouridane, “Emotion recognition from scrambled facial images via many graph
embedding,” Pattern Recognition, vol. 67, pp. 245-251, July 2017.
[28]H. Qayyum, M. Majid, S. M. Anwar, and B. Khan, “Facial Expression Recognition Using Stationary Wavelet Transform Features,” Mathematical Problems in
Engineering, vol. 2017, Article ID 9854050, 9 pages, 2017.
36
Appendix A
Code
MATLAB MATLAB® is a excessive-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment wherein issues and answers are expressed in familiar mathematical notation.
Typical makes use of MATLAB:
• Math and computation
• Algorithm development
• Data acquisition
• Modeling, simulation, and prototyping
• Data evaluation, exploration, and visualization
• Scientific and engineering pictures
Introduction MATLAB is a excessive-performance language for technical computing. It integrates computation, visualization, and programming in an smooth-to-use
surroundings wherein troubles and answers are expressed in familiar mathematical notation. MATLAB stands for matrix laboratory, and turned into written firstly to provide clean get admission to to matrix software program advanced by LINPACK (linear
system bundle) and EISPACK (Eigen machine package deal) initiatives. MATLAB is
37
therefore built on a basis of sophisticated matrix software program wherein the fundamental detail is array that does not require pre dimensioning which to resolve many
technical computing issues, specifically people with matrix and vector formulations, in
a fraction of time.
MATLAB capabilities a own family of applications specific answers called toolboxes.
Very essential to most users of MATLAB, toolboxes permit getting to know and making use of specialized era. These are complete collections of MATLAB functions (Mdocuments) that increase the MATLAB environment to solve specific classes of problems. Areas in which toolboxes are available encompass sign processing, manipulate
device, neural networks, fuzzy good judgment, wavelets, simulation and many others. Typical makes use of of MATLAB consist of: Math and computation, Algorithm
improvement, Data acquisition, Modeling, simulation, prototyping, Data analysis, exploration, visualization, Scientific and engineering images, Application development,
along with graphical person interface building.
FIGURE A.1: Phase 1 MATLAB code.
38
FIGURE A.2: Phase 1 MATLAB code.
39
FIGURE A.3: Phase 1 MATLAB code.
40
FIGURE A.4: Phase 1 Matlab code.
41
F IGURE A.5: Phase 1 Matlab code.
F IGURE A.6: Phase 2 Python code.
42
FIGURE A.7: Phase 2 Python code.
43
Download