s11277-023-10235-4

advertisement
Wireless Personal Communications (2023) 129:2213–2237
https://doi.org/10.1007/s11277-023-10235-4
Sentiment Analysis and Sarcasm Detection using Deep
Multi‑Task Learning
Yik Yang Tan1 · Chee‑Onn Chow1
YongLiang Lim1
· Jeevan Kanesan1 · Joon Huang Chuah1 ·
Accepted: 20 February 2023 / Published online: 4 March 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
Social media platforms such as Twitter and Facebook have become popular channels for
people to record and express their feelings, opinions, and feedback in the last decades.
With proper extraction techniques such as sentiment analysis, this information is useful in
many aspects, including product marketing, behavior analysis, and pandemic management.
Sentiment analysis is a technique to analyze people’s thoughts, feelings and emotions, and
to categorize them into positive, negative, or neutral. There are many ways for someone
to express their feelings and emotions. These sentiments are sometimes accompanied by
sarcasm, especially when conveying intense emotion. Sarcasm is defined as a positive sentence with underlying negative intention. Most of the current research work treats them
as two distinct tasks. To date, most sentiment and sarcasm classification approaches have
been treated primarily and standalone as a text categorization problem. In recent years,
research work using deep learning algorithms have significantly improved performance for
these standalone classifiers. One of the major issues faced by these approaches is that they
could not correctly classify sarcastic sentences as negative. With this in mind, we claim
that knowing how to spot sarcasm will help sentiment classification and vice versa. Our
work has shown that these two tasks are correlated. This paper proposes a multi-task learning-based framework utilizing a deep neural network to model this correlation to improve
sentiment analysis’s overall performance. The proposed method outperforms the existing
methods by a margin of 3%, with an F1-score of 94%.
Keywords Sentiment analysis · Sarcasm detection · Deep learning algorithm · Multi-task
learning
* Chee‑Onn Chow
cochow@um.edu.my
1
Department of Electrical Engineering, Faculty of Engineering, Universiti Malaya,
50603 Kuala Lumpur, Malaysia
13
Vol.:(0123456789)
2214
Y. Y. Tan et al.
1 Introduction
According to Statista, there are 5.03 billion active internet users worldwide, about 63% of
the global population, as of October 2022. Of this total, 93% are social media users [1].
Social media, such as Twitter, Reddit, and Facebook, has become an integral part of our
life. We share almost everything on social media, from social and business events to personal opinions and emotions. Besides, social media is also a popular and reliable platform
for obtaining information in almost real-time. People have strong trust in the information
received and shared by other users on social media. In other words, people can inform and
influence one another via social media platforms. This has significant social, political, and
economic impacts on society.
Nowadays, almost all businesses are using social media to engage directly with their
consumers to understand their needs and advertise their products or services. Consumers have complete control of what they want to see and how they want to respond. A single product review may affect consumer behavior and decision-making. As a result, the
company’s success and failure are made public and spread quickly and widely through the
social media platforms. For example, a survey done by Podium claimed that 93% of internet users are influenced by customer reviews in their purchases and decisions [2]. So, if a
company can keep up quicker with their customers’ thinking, it would be more advantageous to respond promptly and devise a successful strategy to compete with their rivals.
Another impact of social media was observed during the outbreak of the COVID-19
pandemic that emerged in December 2019 [3], which has infected more than 619 million
people and caused more than 6.55 million deaths as of October 2022, and the number is
still increasing. This has created great fear of getting infected and tremendous stress worrying their daily life. According to the American Psychological Association, US adults
recorded the highest stress level since the early days of the COVID-19 pandemic, and 80%
of it is due to the prolonged stress caused by COVID-19 [4]. Social media has become one
of the quickest ways for individuals to express themselves, and this causes the newsfeed
to flood with information representing their thoughts. Undoubtedly, analyzing these newsfeeds is a direct way to capture their emotions and sentiment [5].
Sentiment analysis (also known as opinion mining) is the process of identifying, extracting, and classifying subjective information from unstructured text using text analysis and
computational linguistic techniques in Natural Language Processing (NLP) [6]. It aims to
determine the polarity of sentences using word clues extracted from the sentence’s meaning [, 7, 8]. As a result, sentiment analysis is an essential technique for extracting valuable
information from unstructured data sources, including tweets and reviews. It is widely used
in extracting opinions from online sources, such as product reviews on the Web [9]. Since
then, several other fields have been targeted using sentiment analysis, such as stock market forecasts [10] and responses to terrorist attacks [11]. In addition, research work that
overlaps sentiment analysis and the production of natural language has discussed several
concerns that relate to the applicability of sentiment analysis, such as multi-lingual support
[12] and irony detection [13]. Recently, sentiment analysis shows its importance and plays
a crucial role in understanding people’s feelings during the COVID-19 pandemic. It helps
the government to understand people’s concerns about COVID-19 and take the appropriate
measures accordingly [14].
Despite the potential benefits, automated analysis has limitations, including the complexity of its implementation due to the uncertainty of natural language and the characteristics of the posted content. The study of tweets is an example since hashtags and emoticons
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2215
often accompany them and links, making identifying the expressed sentiment difficult. Furthermore, automated techniques need large datasets of annotated posts or lexical databases
of emotional terms associated with sentiment values. Unlike humans, machines have difficulties relating to the subjectivity of a text, such as sarcastic context [15]. People frequently use encouraging words to express their negative feelings in sarcastic texts. This
fact enables sarcasm to trick sentiment analysis models easily unless explicitly designed to
take sarcasm into account. Ultimately, the variation in the terms used in sarcastic sentences
makes it difficult to train a sentiment analysis model.
Owing to the misclassification of sarcastic texts that could change the polarity of a sentence, the primary aim of this paper is to study the effect of sarcasm detection on sentiment analysis to improve the existing sentiment analysis model for better accuracy and
more intelligent information extraction. The contribution of this paper is twofold. First,
we design a general framework that tackles sentiment analysis and sarcasm detection for
more intelligent and accurate information extraction. Second, we propose deep multi-task
learning to simultaneously train two models for sentiment analysis and sarcasm detection,
respectively, to reduce model complexity and increase model efficiency.
This paper is divided into five sections. A comprehensive literature review of the related
work is presented in Sect. 2. Section 3 provides a detailed explanation on the proposed
multi-task learning framework for sentiment analysis and sarcasm detection. Performance
evaluation is presented in Sect. 4. Section 5 gives the conclusions and possible future work.
2 Related Work
The roots of sentiment analysis on written documents can be observed back in World War
II, where the focus was mainly on politics in nature. It became an active research focus
since the mid of 2000s utilizing Natural Language Processing (NLP) to mine subjective
information from various contents on the Internet. Different methods have been proposed
to train sentiment analysis models ranging from traditional machine learning algorithms to
deep learning algorithms.
2.1 Sentiment Analysis Using Machine Learning
Most natural language processing algorithms were based on complex sets of hand-written
rules up to the 1980s. Since then, a revolution in natural language processing has been
seen with the introduction of machine learning algorithms. Some early work involved the
classification of sentiment based on the categorization method with positive and negative
sentiments, such as in [7], in which three machine learning algorithms were used in their
experiment for sentiment classification. These algorithms are Support Vector Machine
(SVM), Naïve Bayes classifier, and Maximum Entropy. The classification process was carried out using the n-gram method; they are the unigram, bigram, and the combination of
both. Besides, they also used the bag-of-word (BOW) paradigm to introduce algorithms for
machine learning. Great potential has been observed based on the promising performance
shown in their studies.
The syntactic relation between words has been used in [16] to analyze document-level
sentiment. In this paper, the sub-sequences of frequent terms and the sub-trees of dependence were derived from sentences that serve as features for the SVM algorithm. Unigram,
bigram, word subsequence, and dependence have also been extracted from each sentence
13
2216
Y. Y. Tan et al.
in the dataset. Another similar work involves using a mix of unsupervised and supervised
techniques to learn word vectors and subsequently capture the semantic term (document
information) and rich sentiment contents [17].
In [18], a mechanism that embeds the higher-order n-gram phrases with the low-order
dimensional semantic latent space was proposed to define a sentiment classification function. They also used a SVM to construct a discriminative system that estimates latent space
parameters and with a bias towards the classification task. This method can perform both
binary classifications and multi-score sentiment classifications, which involve prediction
within a set of sentiment scores.
A sentiment classification method using entropy weighted genetic algorithm (EWGA),
and SVM has been proposed in [19]. Different sets of features consisting of syntactic and
stylistic characteristics have been evaluated. In terms of stylistic, it reflects the measure of
word length distribution, vocabulary richness, and frequency of special characters. Weights
are allocated for different sentiment attributes before the genetic algorithm is used to optimize the sentiment classification. SVM with a ten-fold cross-validation technique was used
to validate the model, and promising results were obtained.
2.2 Sentiment Analysis using Deep Learning
In recent years, deep learning has received an overwhelming reception as deep learning
does not require traditional, task-specific feature engineering, which makes it a more powerful alternative for sentiment analysis. In [20], an architecture using a deep neural network to determine the similarity of documents was proposed. This architecture was trained
to generate vector foes articles by using multiple market news obtained from T&C. The
cosine similarity was then calculated among the labeled papers considering the polarity
of the documents, while neglecting the contents of the documents. The proposed method
achieved an outstanding performance in terms of similarity estimates of the articles.
The authors in [21] suggested a sequence modeling based neural network for sentiment
analysis at the document level, focusing on customers’ reviews having temporal nature.
Their approach trained the recurrent neural network with the gated recurrent unit (RNNGRU) to learn the distributed product and user representations. These representations were
then fed into a machine learning classifier for sentiment classification. The method was
evaluated on three datasets obtained from Yelp and IMDb. Also, each assessment was
tagged according to the rating score, and the back-propagation algorithm was used to train
the network with Adam’s stochastic optimization to calculate the loss function. Simulation
results showed that the distributed product and user representation learning sequence modeling enhances the document-level performance of sentiment classification.
In [22] sentiment analysis at the sentence level by using a Deep Recurrent Neural
Network (RNN) composed of Long Short-Term Memory (RNN-LSTM) was proposed
because it was found that sentiment analysis that considers a unified feature set including the representative of word embedding, sentiment knowledge, sentiment shifter rules,
statistical and linguistic knowledge had not been previously studied. This combination
provided sequential processing and overcomes some of the flaws in the conventional
methods. A hybrid deep learning method called ConvNet-SVMBoVW was proposed in
[23] to deal with real-time data for fine-grained sentiment analysis. An aggregation
model was created to calculate the hybrid polarity, and SVM was used to train the
beg-of-visual-word (BoVW) to predict the sentiment of visual content. The proposed
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2217
methods not only provided five levels of fine-grain sentiment analysis (highly positive, positive, neutral, negative, and highly negative) but also outperformed the existing
methods.
In a study by [24], examined the sentiments in datasets containing reviews of cars and
real estate in Arabic online. They used the Bi-LSTM (Bidirectional Long Short-Term
Memory), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), CNN (Convolutional Neural Networks), and CNN-GRU deep learning algorithms in combination
with the BERT word embedding model. The real estate dataset had about 6,434 opinions,
whereas the automotive dataset included nearly 6,585 opinions. Three different sentiment
kinds were assigned to the records in both datasets (negative, positive, and mixed). The
BERT model with the LSTM produced the greatest F1 score of 98.71 percent for the vehicle dataset. On the other hand, utilizing the BERT model with the CNN, the maximum F1
score for the real estate dataset was 98.67%.
Recently, multi-task learning has gained substantial attention in deep learning research.
Multi-task learning allows multiple tasks to be performed simultaneously by a shared
model. [25] proposed a multi-task learning approach based on CNN and RNN. This model
jointly learns the citation sentiment classification (CSC) and citation purpose classification
(CPC) to boost the overall performance of automated citation analysis and simultaneously
ease the problem of inadequate training data and time-consuming for feature engineering.
[26] proposed a method utilizing sarcasm detection to improve standalone sentiment
classifiers, similar to our proposed method. This method required two explicitly trained
models: the sentiment model and the sarcasm model. The feature extraction of sarcasm
detection used top word features, and Unigram and 4 Boaziz features consist of punctuation-relate features, sentiment-relate features, and lexical and syntactic features. The sarcasm detection used the Random Forest algorithm, while the sentiment classification used
the Naïve Bayes algorithm. The model achieved 80.4% accuracy, 91.3% recall, and 83.2%
precision. The evaluation also showed that sarcasm detection improves the performance of
sentiment analysis by about 5.49% improvement.
This section introduces different techniques used by researchers on Sentiment Analysis, such as N-gram, Hybrid MLTs, and Deep learning methods. Also, finding a suitable
dataset and converting data to numerical vector form is a crucial step all researchers take
to obtain better results. The accuracy obtained from various methods is high, such as the
N-gram method achieved an accuracy of 94.6% using SVM [18], and the hybrid MLTs
method achieved 91.7% using a Hybrid of EWGA and SVM [19]. Most of these deep
learning methods outperformed the conventional ones. However, some shortcomings can
be observed in the method. As discussed in the previous section, sarcasm context plays a
vital role in sentiment classification. If sarcasm is not being considered in the system, the
sarcastic text is classified as a positive tweet, which leads to misclassification. An additional step is required to solve this misclassification to obtain more accurate outcome.
A significant milestone has been achieved in the field of sentiment analysis in the last
decades in mining and understanding text-based documents for various purposes using
machine learning and deep learning methods. However, there are limitations in the existing techniques due to the inherent nature of languages, such as using sarcasm in expressing
feelings. This issue was tackled in [26] using two explicitly trained models for each task.
This method provides more accurate sentiment analysis but comes with a price of higher
complexity, longer processing time, and possible overfitting. In this paper we propose a
complete framework that enhances sentiment classification by detecting the presence of
sarcasm in the sentence. In order to reduce the complexity and processing time, multi-task
learning is deployed in our framework.
13
2218
Y. Y. Tan et al.
Fig. 1 Sentiment analysis and sarcasm detection using deep multi-task learning: overall framework
3 Deep Multi‑Task Learning Framework for Sentiment Analysis
with Sarcasm Detection
Sarcasm is defined as the use of remarks that clearly carry the opposite meaning or sentiment. It is made in order to mock or to annoy someone, or for humorous purposes. The
presence of sarcasm in a document makes sentiment analysis less accurate as the conventional methods are not capable of detecting sarcasm. In this project we propose a framework for sentiment analysis that considers the possible presence of sarcasm. Specifically,
the framework involves using pre-processed data to train a Bidirectional LSTM (Bi-LSTM)
network to simultaneously perform two tasks: sentiment classification and sarcasm classification. This method is called multi-task learning and has been proven to improve learning
efficiency and prediction accuracy for task-specific models. Besides, each classifier in the
proposed framework has a distinct perceptron layer, and different activation functions are
used in the perceptron layer to classify various classification tasks, either binary or multiclass. The architecture of the proposed framework is given in Fig. 1, and the details of each
component in the framework are explained in detail in the remaining of this section.
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2219
Fig. 2 a Sentiment header data; b Amount of data for different label plotted in a graph
Fig. 3 a Sarcasm header data; b Amount of data for different label plotted in a graph
3.1 Data Acquisition
Data acquisition is the first step in machine and deep learning and usually involves
spending lots of time and resources on gathering data that may or may not be relevant.
Two datasets are needed in the proposed framework: a sentiment dataset and a sarcasm
dataset. In this paper, the dataset used is obtained from Kaggle. The sentiment analysis
dataset [27], which is extracted from Twitter, contains 227,599 tweets labeled as 0 for
neutral, 1 for negative sentiment, and 2 for positive sentiment. In order to have a better
understanding of the dataset, some sample headers are shown Fig. 2a, and the distribution is given in Fig. 2b. It is important to note that the dataset is slightly unbalanced,
with the neutral label being the majority and the negative one being the minority. As
for the sarcasm dataset [28], there are 28,619 tweets labeled with 0 for not sarcastic
and 1 for sarcastic. Figure 3a shows some samples of the sarcastic and not sarcastic
headers of tweets. Similarly, the dataset is slightly unbalanced, with more tweets that
are not sarcastic (type 0) in Fig. 3b.
13
2220
Y. Y. Tan et al.
3.2 Data Pre‑Processing
Most of the datasets available online or collected manually are noisy and unstructured in
nature. So pre-processing is needed to transform the noisy datasets into an understandable format for the training to ensure high accuracy. In this paper, the same pre-processing
method is used for both datasets since both sentiment and sarcasm classifications use the
same natural language processing (NLP) techniques. The pre-processing steps used in our
work are summarized as follows:
a. Removal of irrelevant words such as hyperlinks and noisy words (such as retweet and
stock markets tickers and stop words).
b. Removal of punctuations.
c. Word stemming to reduce inflected to tier word stem or base.
d. Tokenization to covert text to vector representation as input to deep learning model.
3.3 Deep Multi‑Task Learning Neural Network
Figure 4 shows the overall architecture of the multi-task learning deep neural network. In
this stage, the pre-processed data are converted into a numerical vector before they are used
to train the Bi-LSTM network for both sentiment classification and sarcasm classification.
3.3.1 Embedding Layer
Word embedding takes in the pre-processed input and converts each of them into a vector
of numeric values. It is a typical method for representing words in text analysis, in which
words with similar meanings are expected to be closer in the vector space. In this paper, the
Word2Vec model is used to generate a word vector matrix to convert the inputs [29]. Word2Vec calculates the vectors to represent the degree of semantic similarity [between words]
using the Cosine Similarity method. Each text of n words represents as T = w1 , w2 , … , wn
is converted into an n-dimensional dense vector as given in Eq. 1.
]
[
T = w1 , w2 , … , wn ∈ Rn∗d
(1)
Since each input text may have different lengths, but the length of the produced vectors is fixed, denoted as l . With thin in mind, zero padding is used if the text has a length
shorter than l . Thus, all texts have the same matrix dimension and result in the final form
of text input with (l) dimension as given in Eq. 2.
]
[
T = w1 , w2 , … , wn ∈ Rl∗d
(2)
3.3.2 Multi‑Task Learning
The purpose of the proposed framework is to enhance the performance of standalone
sentiment classification by adding an auxiliary task which is sarcasm detection. This
can be achieved by training two explicit models sequentially: the sentiment model as
the primary task and the sarcasm model as the secondary task. However, this method is
inefficient, redundant, and costly in terms of processing time and computation power.
With this in mind, we propose the use of multi-task learning in our framework to train
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2221
Fig. 4 Architecture of Multi-Task Learning
these two tasks simultaneously with a shared Bi-LSTM layer. This method offers many
advantages, including reduced overfitting through shared representations and reduced
model complexity, improved data efficiency, and fast learning by leveraging auxiliary
information. Besides, multi-task learning also helps in alleviating known weaknesses
of deep learning methods, such as computational demand and large-scale data requirements [30].
As shown in Fig. 4, the two classification tasks share a single-layer Bi-LSTM recurrent
neural network but separate multilayer perceptron (MLP) layer. The sentences obtained
from word embedding are fed into the Bi-LSTM network, and the output is then passed
through a fully connected layer (FCL) to get the sentence representation, Z∗. This operation
can be represented by Eq. 3,
13
Y. Y. Tan et al.
2222
Z∗ = FCLayer∗ (bidirection(LSTM(X)))
(3)
where * represent the shared layer between the two tasks, and X is the list of input sentence
obtained from the word embedding.
The respective classification is then obtained by applying the activation function to the
sentence representation Z∗. In this framework, the activation functions used are softmax
and sigmoid for sentiment classification and sarcasm classification, respectively, as given
by Eqs. 4(a) and (b).
Ssentiment = Softmax(Z∗ )
(4a)
Ssarcasm = Sigmoid(Z∗ )
(4b)
3.3.3 Loss Function
The configuration of the frameworks allows the secondary task to inform the training on
the primary task by computing the loss of the model using Eq. 5:
∑
∑
Li =
L1 (x, y) +
L2 (x, y)
(5)
(x,y)∈Ω1
(x,y)∈Ω2
where L1 denoted as the loss for the primary task, L2 denoted as the loss for the secondary
task, and Li is the total loss of the proposed model. The total loss Li is used to compute for
each sentence present in the dataset Ωi.
The cross-entropy losses are given in Eqs. 6 and 7, respectively, for sentiment classification and sarcasm classification.
∑C
Categorical Cross Entropy = −
t log(f (Softmax)i )
(6)
j i
( )
Binary Cross Entropy = −ti log s1 − (1 − t1 )log(1 − s1 )
(7)
In order to optimize the performance of the model, the RMSprop optimizer is used in
our framework. At every epoch, the proposed algorithm computes the gradient of Li for
each batch to fine-tune our parameters.
3.3.4 Bi‑Directional Long Short‑Term Memory (Bi‑LSTM)
The use of deep recurrent neural networks (RNNs) has been proven to be a highly effective
way in sentiment analysis, as RNNs have feedback loops in the recurrent layer that allows
information to be maintained as ‘memory’ over time. Training conventional RNNs to solve
problems that require learning long-term temporal dependencies, such as sentences or
tweets, can be difficult because the gradient of the loss function decays exponentially with
time, known as the vanishing gradient [31]. Hence, we use Bi-LSTM to tackle this problem. Bi-LSTM is a sequence processing model consisting of two LSTMs: one taking the
input in a forward direction and the other in a backward direction. It provides more context
to the algorithm and results in faster and more complete learning of the problem. In other
words, it allows the network to preserve information from both the future and the past.
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2223
Fig. 5 Architecture of LSTM
The basic architecture of the LSTM network is illustrated in Fig. 5. The LSTM network
is a type of RNN that uses special units called memory cells and other standard units to
solve the vanishing gradient problem. An LSTM network can accomplish this by incorporating a cell state and three different gates: the forget gate, the input gate, and the output
gate. Using this explicit gating mechanism, the cell can determine what to do with the state
vector at each step: read from it, write to it, or delete it. The input gate allows the cell to
decide whether to update the cell state or not. The cell can also erase its memory with the
forget gate to determine whether to make the output information available at the output
gate. Ultimately, the ability to ‘memorize’ and forget makes LSTM a suitable method for
the sentiment and sarcasm model because every word present in a sentence is crucial. Furthermore, it is vital to conserve information bi-directionally from the past to the future and
from the future to the past when evaluating the nature of a sentence.
3.4 Multi‑Layer Perceptron (MLP)
Figure 6 gives a deeper insight of the multilayer perceptron used in the proposed framework. It can be observed that the primary task and the secondary task have similar architecture except for the activation functions of the dense layer. For both tasks, we fed the
ReLU activation function to ensure all negative elements present in Z∗ (outputs of the Bidirectional neural network) to be 0, as shown in Eq. 8. This implementation simplifies the
backward propagation computation, giving benefits such as fast training and preventing
vanishing gradients.
ReLU(x) = max(0, x)
(8)
Then, a dropout is added to the model as a regularizer where the algorithm randomly
sets half of the activations on the fully connected layers to zero during training. This technique improves the generalization ability and reduces overfitting [32]. Finally, training an
LSTM-based multi-task model involves remembering broadly diverse sequential data. In
this case, RMSprop is a perfect optimizer as it uses a moving average of squared gradients
13
2224
Y. Y. Tan et al.
Fig. 6 Architecture of Multilayer Perceptron (MLP)
to normalize the gradient itself [33]. This provides an effect of balancing the step size:
increasing the step size for a small gradient to avoid vanishing gradient and decreasing the
step size for a large gradient to avoid exploding gradient. Hence ensuring efficient learning
by the model [34].
3.5 Model Analysis
In this section, we provide a preliminary analysis of the performance of the proposed
framework. The proposed model is trained using the hyperparameters defined in Table 1,
and Fig. 7 gives the training and validation losses. The validation loss reduces significantly
from 1.4 to 0.5 from epoch 0 to 15, which shows that the model is learning effectively.
From epoch 15 onwards, the model has stopped learning as the validation loss has reached
a steady state, although the training losses are still decreasing. For this reason, the epoch is
set to 10 for training since the validation loss equals the training loss with this number of
epochs.
In terms of complexity, the proposed model is slightly more complex than the conventional deep neural networks due to bi-directional data propagation. Compared to similar
classification work that involves two tasks [26], the proposed model is much simpler due
to the multi-task learning. Figure 7 shows that the validation loss curve remains constant at
around 0.5 instead of going upwards. This indicates that the proposed method successfully
reduces overfitting.
13
2225
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
Table 1 Hyperparameters
Settings
Parameters
Values
Embedding dimension
Word2Vec = 200
Bi-LSTM output size
Dropout
Epoch
Learning rate
Batch size
Validation split
Loss function
Optimizer
20
0.4
50
0.001
128
0.2
Cross-entropy
RMS Prop
Model loss vs Epoch
1.8
1.6
1.4
Loss
1.2
1
0.8
0.6
0.4
0.2
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
loss
Epoch
val_loss
Fig. 7 Validation and training losses
In terms of training time, we compare the performance with the model presented in [26], as
given in Fig. 8 by using the same hypermeters used in our proposed method. At a glance, [26]
repeatedly uses the same input and hidden layer settings to train the sentiment and sarcasm
model, which is redundant, and the model training has to be done twice. Without any doubt,
the multi-task learning method allows us to train two tasks at each epoch and thus significantly
lower the training time.
In terms of computation power, the proposed method requires less computation power
because the neural network used is smaller than [26] work. The goal is to shrink the neural network size so that we can remove unnecessary processes and mathematical operations.
Therefore, data passed through the hidden layer once instead of twice, significantly reducing
unwanted processes and computation power.
13
2226
Y. Y. Tan et al.
Fig. 8 Architecture of two explicitly sentiment and sarcasm model
4 Performance Evaluation
4.1 Experiment Settings
Figure 9 shows the overview of the experiments conducted to evaluate the performance
of the proposed framework. There are two categories of datasets used. The first experiment was performed based on various baselines and variant models used in other similar works for a fair comparison. For the second experiment, we extracted data from multiple social media platforms, including Twitter and Reddit, to eliminate bias by feeding
unseen data to the model.
Table 2 shows the relationship between sentiment and sarcasm classifications. If the
sentiment is positive and sarcastic, the tweet is then classified as negative. For example,
a tweet, “So many assignments, I love school life so much.” The sentiment classifier
will detect this tweet as positive because “Love school life so much” is highlighted by
the classifier as a positive remark. On the other hand, the sarcasm classifier will identify it as sarcastic due to the phrase “So many assignments.” In this case, this tweet is
negative.
13
2227
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
Sentiment
Classification
Positive
Experiment
Dataset
Neutral
Sarcasm
Classification
Train
test split
Evaluation
Negative
Reddit
Experiment 1
Twitter
US
Airline
Positive
Neutral
Negative
Experiment 2
Fig. 9 Overview of performance evaluation
Table 2 Sentiment and Sarcasm
Relationship
Sentiment score
Sarcasm score
Output
Negative
Not sarcastic
Negative
Negative
Positive
Positive
Sarcastic
Not sarcastic
Sarcastic
Negative
Positive
Negative
4.2 Evaluation Metrics
The following standard metrics are used in this paper to benchmark the performance of the
proposed framework.
a. Recall is a measure of the ability of a model to detect positive in each sentence (as
known as the sensitivity). It is the ratio of true positive (TP) to the total of True Positive
and false negative (FN) as given in Eq. 9.
TP
Recall =
(9)
TP + FN
b. Precision is the accuracy of positive predictions, and it is defined as the ratio of true
positives to the total predicted positives, including both true positive (TP) and false
positive (FP) as given in Eq. 10.
TP
Precision =
(10)
TP + FP
13
Y. Y. Tan et al.
2228
iii. F1-score is a helpful metric to compare two classifiers. F1 score considers both recall
and precision, which is defined as Eq. 11.
F1score = 2 ×
Precision × Recall
Precision + Recall
(11)
Another commonly used parameter named accuracy is the ratio of total correct predictions
to total predictions. The accuracy metric did not consider false negative/positive in the calculation, making this metric highly unreliable. Imagine having data of 1000 sarcastic tweets with
900 sarcastic and 100 not sarcastic. Suppose our classifier predicts all sarcastic, we will have
a high accuracy of 90%, but in fact, we will get 0 using the F1-score metric due to the recall.
Hence, our experiment will utilize the F1 score as the primary metric to evaluate our proposed
model performance.
4.3 Experiment 1: Model Baselines and Variants
The dataset used is split into 80% for training and 20% for testing. The following baselines
and variations of the models are compared.
a. Standalone classifier with RNN with the following characteristics.
Zsentiment = FCLayersentiment (RNN(Xsentiment ))
Zsarcasm = FCLayersarcasm (RNN(Xsarcasm ))
Ssentiment = Softmax(Zsentiment )
Ssarcasm = Sigmoid(Zsarcasm )
In this case, there are two standalone models respectively, for sentiment and sarcasm
models. X is the list of input sentences obtained from the word embeddings. We feed X to
RNN and then pass the output through a fully connected layer (FCLayer) to get the sentence representation Z. Finally, Z is passed through the dense layer, which is softmax and
sigmoid for sentiment and sarcasm classification, respectively (Ssentiment &Ssarcasm).
b. Standalone classifier using CNN with the following characteristics:
Zsentiment = FCLayersentiment (CNN(Xsentiment ))
Zsarcasm = FCLayersarcasm (CNN(Xsarcasm ))
Ssentiment = Softmax(Zsentiment )
Ssarcasm = Sigmoid(Zsarcasm )
This case is the same as case (i) except CNN is used instead of RNN.
iii. Standalone classifier using Bi-GRU with the following characteristics:
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2229
))
(
(
Zsentiment = FCLayersentiment (Bidirection GRU Xsentiment )
))
(
(
Zsarcasm = FCLayersarcasm (Bidirection GRU Xsarcasm )
Ssentiment = Softmax(Zsentiment )
Ssarcasm = Sigmoid(Zsarcasm )
This case is the same as the two previous cases except Bi-GRU is used.
d. Standalone classifier using Bi-LSTM with the following characteristics:
))
(
(
Zsentiment = FCLayersentiment (Bidirection LSTM Xsentiment )
))
(
(
Zsarcasm = FCLayersarcasm (Bidirection LSTM Xsarcasm )
Ssentiment = Softmax(Zsentiment )
Ssarcasm = Sigmoid(Zsarcasm )
This case also involves two standalone models same as the previous cases except BiLSTM is used.
v. Multi-task learning using LSTM with the following characteristics:
Z∗ = FCLayer∗ (bidirection(LSTM(X)))
Ssentiment = Softmax(Z∗ )
Ssarcasm = Sigmoid(Z∗ )
where * represent the shared layer between two tasks sentiment.
The experiment results are given in Fig. 10. First, when the four standalone models are
compared, it is evident that the conventional neural networks offer poorer outcomes as
compared to the more recent Bi-LSTM network. The Bi-LSTM outperforms other methods by a margin of 0.4%, 0.1%, and 0.1%, respectively, for RNN, CNN, and Bi-GRU. It is
essential to highlight that this model achieves an F1-score of 91% for sentiment classification and 92% for sarcasm classification. This improvement is achieved using memory gates
that preserve important data while erasing useless data in the Bi-LSTM network. Besides,
this network also allows the machines to learn bi-directionally, so information from the
past and the future are preserved for better classification.
Next, we compare the standalone model with the proposed framework, which uses
multi-task learning with a Bi-LSTM network. The proposed method outperforms all the
other models in sentiment and sarcasm classification, with an F1-score of 94% and 93%,
respectively. Compared to the standalone Bi-LSTM model, the improvement is 3% and
1%, respectively, for sentiment and sarcasm classification. This improvement is because
multi-task learning utilizes a shared layer, reduces the risk of overfitting when computing
gradient descent, and ensures efficient learning. This also means that the sarcasm classifier
13
Y. Y. Tan et al.
2230
Fig. 10 Experimental results using different variety of models: a sentiment classification; b sarcasm classification
Table 3 Distribution of
Sentiment Sentences
Datasets
Positive
Negative
Neutral
Reddit
15,830 (42%)
8277 (22%)
13,142 (35%)
Twitter US airline
Twitter
2363(16%)
1103 (17%)
9178 (63%)
4001 (61%)
3090 (21%)
1430 (22%)
successfully boosts the sentiment classifier’s performance. It can also be observed that the
margin of improvement for the sentiment classifier is more significant than the sarcasm
classifier because sarcasm detection is a subtask of sentiment analysis.
It is also important to note that the performance of the proposed model is consistent
when we analyze the precision and the recall values, in which the pattern is the same
as the F1-Score. The margin of improvement is also about 3% and 1%, respectively, for
sentiment and sarcasm classifications.
4.4 Experiment 2: Testing the Proposed Method Using Unbiased Datasets
In this experiment, we evaluate the performance of the proposed method using different
datasets and compare it with the Bi-LSTM standalone model, which is the best standalone model from the previous experiment. This experiment aims to analyze the performance of our model in analyzing unseen data and gauge how well the model does in
a real-life scenario. Three datasets obtained from Kaggle are used in this experiment, as
summarized in Table 3, and explained as follows.
a. Reddit dataset
The Reddit dataset was collected from the comment section of Reddit posts. It is an
unbalanced dataset with major being positive sentiment. This dataset fits the purpose
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2231
of this experiment well because it is from a different social media platform, which may
have a different structure in writing
b. Twitter US Airline dataset [35]
This dataset was collected from Twitter US Airline, consisting of reviews written by
their customers. The contributors were asked to classify the nature of its comment as positive, negative, or neutral tweets, followed by their reasons for giving such comments. It is
also an unbalanced dataset, with negative sentiment being the majority of the data.
iii. Twitter dataset
This dataset was scraped from Twitter using the Tweepy module. We can see that it is
an unbalanced dataset, with negative sentiment being the majority of data. This dataset is
the closest in nature to the dataset used to train models.
Figure 11 shows the experimental results obtained through testing using different datasets. The F1-score is given in Fig. 11a. We can see that the Twitter dataset has an F1-score
of 58% for the standalone sentiment classifier and 63% for the proposed method, which
is the highest among the three datasets. We believe the reason for this better performance
is the nature of the dataset, which is the same as the data used for training but of broader
scope. The Reddit dataset obtained an F1-score of 55% for the standalone sentiment classifier and 61% for the proposed method, while the Twitter US airline dataset obtained an
F1-score of 52% for the standalone sentiment classifier and 54% for the proposed method.
Generally, the performance of the proposed model on unbiased data is relatively low,
with an average performance of about 59%. To better understand this poor performance,
we observe the neutral score of the proposed method on these datasets, as shown in 10(b).
It is essential to highlight that the poor performance on one of the labels will significantly
affect the overall F1-score of the model. Thus, it can be observed that a deficient neutral
label F1-score with 48%, 51%, and 44% for Reddit, Twitter, and Twitter US airline datasets, respectively. The low F1-score is mainly due to the low recall scores on classifying
neutral sentiment, which are 33%, 36%, and 29% for Reddit, Twitter, and Twitter US airline
datasets, respectively. This suggests that many neutral sentiment sentences are classified as
false negatives. Figure 12 shows the word cloud for the neutral label in these datasets. We
can observe that these neutral words, mostly stop words, are removed in the pre-processing
stage, and consequently, the model has difficulty classifying neutral tweets. Nevertheless,
the proposed method still outperforms the existing methods in analyzing unbiased datasets.
5 Conclusions and Future Work
Sentiment analysis plays a vital role in the current digital world. It can provide useful information obtained from the Internet, especially social media, for various purposes. In this
paper, we tackle the problem of inaccurate sentiment analysis caused by the use of sarcasm
in some sentences. More specifically, a multi-task learning framework for simultaneous
sentiment analysis and sarcasm detection has been proposed. In this framework, sarcasm
detection aims to detect the sarcastic context in a sentence and improve the accuracy of
sentiment analysis. Two experiments using different databases were conducted to evaluate
13
2232
Y. Y. Tan et al.
Fig. 11 Experimental results using unbiased datasets: a F1-Score; b neutral score
the performance of the proposed framework, and the results show promising improvement.
It can be concluded that sarcasm tweets do affect the performance of the sentiment analysis
model, and adding an efficient sarcasm detection model can significantly improve the overall performance of the sentiment analysis model.
The performance of the proposed method relies on the dataset used for training. As
demonstrated in the second experiment presented in the previous section, the performance
of the proposed framework is poorer on unseen datasets because neutral sentiments return
poor recall scores, which also leads to lower F1 scores. This happens because the stop
words, which are neutral in nature, are removed during the pre-processing stage, making
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2233
Fig. 12 Word cloud for neutral label in datasets
the neutral class less significant. To overcome this problem, a more appropriate pre-processing technique is needed to effectively remove stop words without reducing the number
of neutral statements in the datasets.
Acknowledgements This work was supported by the Impact Oriented Interdisciplinary Research Grant
(IIRG) Programme, University of Malaya [IIRG002B-19IISS].
13
2234
Y. Y. Tan et al.
Authors Contributions Conceptualization, Y.-Y.T., C.-O.C., J.-H.C.; Methodology, Y.-Y.T., C.-O.C., J.K.;
Software, Y.-Y.T., Investigation, Y.-Y.T., C.-O.C.; Data, Y.-Y.T., J.K.; Analysis, Y.-Y.T., C.-O.C.; Validation,
Y.L.; Writing (Original Draft), Y.-Y.T.; Writing (Review), C.-O.C.; Supervision: C.-O.C.; Funding Acquisition, C.-O.C.
Funding This work was supported by the Impact Oriented Interdisciplinary Research Grant (IIRG) Programme, University of Malaya [IIRG002B-19IISS].
Data Availability The datasets generated during and/or analysed during the current study are available from
the corresponding author on reasonable request.
Code Availability The datasets generated during and/or analysed during the current study are available from
the corresponding author on reasonable request.
Declarations
Conflicts of interest The authors declare that they have no conflicts of interest and competing interests.
References
1. Statista Research Department. (2022, September 20). Internet and social media users in the world
2022. Statista. Retrieved October 6, 2022, from https://​www.​stati​sta.​com/​stati​stics/​617136/​digit​al-​
popul​ation-​world​wide/
2. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., Fan, G., Xu, J., Gu, X., Cheng, Z.,
Yu, T., Xia, J., Wei, Y., Wu, W., Xie, X., Yin, W., Li, H., Liu, M., & Cao, B. (2020). Clinical features
of patients infected with 2019 novel coronavirus in Wuhan China. The Lancet, 395(10223), 497–506.
https://​doi.​org/​10.​1016/​s0140-​6736(20)​30183-5
3. American Psychological Association. (n.d.). APA: U.S. adults report highest stress level since early
days of the COVID-19 pandemic. American Psychological Association. Retrieved October 6, 2022,
from https://​www.​apa.​org/​news/​press/​relea​ses/​2021/​02/​adults-​stress-​pande​mic
4. Online Reviews Stats & Insights. Podium. (n.d.). Retrieved October 6, 2022, from https://​www.​podium.​
com/​resou​rces/​podium-​state-​of-​online-​revie​ws.
5. De Choudhury, Munmun, Counts, & Scott. (2012). The nature of emotional expression in social
media: measurement, inference and utility. Human Computer Interaction Consortium (HCIC).
6. Zhao, J., Liu, K., & Xu, L. (2016). Sentiment analysis: Mining opinions, sentiments, and emotions.
Computational Linguistics, 42(3), 595–598. https://​doi.​org/​10.​1162/​coli_r_​00259
7. Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics–ACL ’04. https://​doi.​org/​10.​3115/​12189​55.​12189​90
8. Turney, P. D. (2001). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics–ACL ’02. https://​doi.​org/​10.​3115/​10730​83.​10731​53
9. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and
semantic classification of product reviews. In: Proceedings of the Twelfth International Conference on
World Wide Web - WWW ’03. https://​doi.​org/​10.​1145/​775152.​775226
10. Khadjeh Nassirtoussi, A., Aghabozorgi, S., Ying Wah, T., & Ngo, D. C. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16), 7653–7670. https://​doi.​
org/​10.​1016/j.​eswa.​2014.​06.​009
11. Burnap, P., Williams, M. L., Sloan, L., Rana, O., Housley, W., Edwards, A., Knight, V., Procter, R.,
& Voss, A. (2014). Tweeting the terror: Modelling the social media reaction to the Woolwich terrorist
attack. Social Network Analysis and Mining. https://​doi.​org/​10.​1007/​s13278-​014-​0206-4
12. Hogenboom, A., Heerschop, B., Frasincar, F., Kaymak, U., & de Jong, F. (2014). Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decision Support Systems, 62, 43–53.
https://​doi.​org/​10.​1016/j.​dss.​2014.​03.​004
13. Reyes, A., & Rosso, P. (2013). On the difficulty of automatically detecting irony: Beyond a simple
case of negation. Knowledge and Information Systems, 40(3), 595–614. https://​doi.​org/​10.​1007/​
s10115-​013-​0652-8
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2235
14. Arunachalam, R., & Sarkar, S. (2013). The new eye of government: Citizen sentiment analysis in
social media. In: Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for
Social Media (SocialNLP), 23–28.
15. Diana, M., & MA, G. (2014). Who cares about sarcastic tweets? Investigating the impact of sarcasm
on sentiment analysis. Lrec 2014 Proceedings.
16. Matsumoto, S., Takamura, H., & Okumura, M. (2005). Sentiment classification using word subsequences and dependency sub-trees. Advances in Knowledge Discovery and Data Mining. https://​doi.​
org/​10.​1007/​11430​919_​37
17. Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors
for sentiment analysis. Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, 142–150.
18. Bespalov, D., Bai, B., Qi, Y., & Shokoufandeh, A. (2011). Sentiment classification based on supervised latent N-gram analysis. Proceedings of the 20th ACM International Conference on Information
and Knowledge Management - CIKM ’11. https://​doi.​org/​10.​1145/​20635​76.​20636​35
19. Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems, 26(3), 1–34.
https://​doi.​org/​10.​1145/​13616​84.​13616​85
20. Yanagimoto, H., Shimada, M., & Yoshimura, A. (2013). Document similarity estimation for sentiment analysis using neural network. 2013 IEEE/ACIS 12th International Conference on Computer and
Information Science (ICIS). https://​doi.​org/​10.​1109/​icis.​2013.​66078​25
21. Chen, T., Xu, R., He, Y., Xia, Y., & Wang, X. (2016). Learning user and product distributed representations using a sequence model for sentiment analysis. IEEE Computational Intelligence Magazine,
11(3), 34–44. https://​doi.​org/​10.​1109/​mci.​2016.​25725​39
22. Abdi, A., Shamsuddin, S. M., Hasan, S., & Piran, J. (2019). Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Information Processing & Management, 56(4),
1245–1259. https://​doi.​org/​10.​1016/j.​ipm.​2019.​02.​018
23. Kumar, A., Srinivasan, K., Cheng, W.-H., & Zomaya, A. Y. (2020). Hybrid context enriched deep
learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data.
Information Processing & Management, 57(1), 102141. https://​doi.​org/​10.​1016/j.​ipm.​2019.​102141
24. Yafoz, A., & Mouhoub, M. (2021). Sentiment analysis in Arabic social media using deep learning
models. 2021 IEEE International Conference on Systems, Man, and Cybernetics SMC. https://​doi.​org/​
10.​1109/​smc52​423.​2021.​96592​45
25. Yousif, A., Niu, Z., Chambua, J., & Khan, Z. Y. (2019). Multi-task learning model based on recurrent
convolutional neural networks for citation sentiment and purpose classification. Neurocomputing, 335,
195–205. https://​doi.​org/​10.​1016/j.​neucom.​2019.​01.​021
26. Yunitasari, Y., Musdholifah, A., & Sari, A. K. (2019). Sarcasm detection for sentiment analysis in
Indonesian tweets. IJCCS Indonesian Journal of Computing and Cybernetics Systems, 13(1), 53.
https://​doi.​org/​10.​22146/​ijccs.​41136
27. A, C. K. (2019). Twitter and reddit sentimental analysis dataset. Kaggle. Retrieved October 7, 2022,
from https://​www.​kaggle.​com/​datas​ets/​cosmo​s98/​twitt​er-​and-​reddit-​senti​mental-​analy​sis-​datas​et
28. Misra, R. (2019). News headlines dataset for sarcasm detection. Kaggle. Retrieved October 7, 2022,
from https://​www.​kaggle.​com/​datas​ets/​rmisra/​news-​headl​ines-​datas​et-​for-​sarca​sm-​detec​tion
29. Mikolov, T., Kai, C., Corrado, G., & Jeffrey, D. (2013). Efficient estimation of word representations in
vector space. ArXiv Preprint ArXiv:1301.3781.
30. Crawshaw, M. (2020). Multi-task learning with deep neural networks: A survey. ArXiv Preprint
ArXiv:2009.09796.
31. Arbel, N. (2020). How LSTM networks solve the problem of vanishing gradients. Medium. Retrieved
October 7, 2022, from https://​medium.​datad​riven​inves​tor.​com/​how-​do-​lstm-​netwo​rks-​solve-​the-​probl​
em-​of-​vanis​hing-​gradi​ents-​a6784​971a5​77
32. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15,
1929–1958.
33. Agarap, A. F. (2018). Deep learning using rectified linear units (ReLU). ArXiv Preprint
ArXiv:1803.08375.
34. Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 06(02),
107–116. https://​doi.​org/​10.​1142/​s0218​48859​80000​94
35. Eight, F. (2019). Twitter us airline sentiment. Kaggle. Retrieved October 7, 2022, from https://​www.​
kaggle.​com/​datas​ets/​crowd​flower/​twitt​er-​airli​ne-​senti​ment
13
2236
Y. Y. Tan et al.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
Yik Yang Tan is currently a student at the Department of Electrical
Engineering, University of Malaya. His research topics include application of machine learning techniques and big data in solving engineering problems, making the most out of state-of-art techniques.
Chee‑Onn Chow received his Bachelor of Engineering (honors) and
Master of Engineering Science degrees from University of Malaya,
Malaysia in 1999 and 2001, respectively. He received his Doctorate of
Engineering from the Tokai University, Japan in 2008. He joined the
Department of Electrical Engineering, University of Malaya 1999, and
currently an Associate Professor in the same department. His research
areas include wireless communications, multimedia transmission, data
analytics and machine learning. He is member of IET and senior member of IEEE. He is a registered Professional Engineer (Board of Engineers Malaysia) and Chartered Engineer (IET).
Jeevan Kanesan received his B.S. degree in Electrical & Electronics
Engineering from University Technology Malaysia, Malaysia, in 1999,
M.S. degree and Ph.D. degree in mechanical engineering from
Author’s Picture & Biography Click here to access/download;Author’s
Picture & Biography;Bio.pdf University Science Malaysia, Malaysia
in 2003 and 2006 respectively. He worked as equipment engineer at
Carsem Semiconductor, Malaysia between 2000 and 2001 and IC
Design engineer in the thermomechanical department, Intel Technology Sdn. Bhd., Malaysia from 2006 to 2008. He has been with the
University of Malaya, Malaysia in the Electrical Engineering department since 2008. He has published in over 50 peer-reviewed technical
13
Sentiment Analysis and Sarcasm Detection using Deep Multi‑Task…
2237
papers in international journals. His research interests include Nature Inspired Metaheuristics, Optimization,
CAD of VLSI circuits and design and analysis of algorithms.
Joon Huang Chuah received the B.Eng. (Hons.) degree from the Universiti Teknologi Malaysia, the M.Eng. degree from the National University of Singapore, and the M.Phil. and Ph.D. degrees from the University of Cambridge. He is currently Head of VIP Research Group at
the Department of Electrical Engineering, University of Malaya. He
was the Honorary Treasurer of IEEE Computational Intelligence Society (CIS) Malaysia Chapter and the Honorary Secretary of IEEE
Council on RFID Malaysia Chapter. He is the Vice Chairman of the
Institution of Engineering and Technology (IET) Malaysia Network.
He is a Fellow and the Honorary Secretary of the Institution of Engineers, Malaysia (IEM). His main research interests include image processing, computational intelligence, IC design, and scanning electron
microscopy.
YongLiang Lim received the Bachelor degree and the Master degree in
Electronic Engineering from the University of Science Malaysia
(USM) Engineering Campus, Malaysia, in 2016 and 2019, respectively. He is currently a PHD student at the Department of Electrical
Engineering, University of Malaya. His research interests are computational intelligence and context-awareness.
13
Download