Uploaded by Salam Alaugby

Salam Alaugby article 231-240-pages-235-244

advertisement
Sentiment Analysis Combination
in Terrorist Detection on Twitter: A Brief
Survey of Approaches and Techniques
Esraa Najjar and Salam Al-augby
Abstract Terrorism is a big concern for many governments and people, especially
with using social media such as Twitter that uses new technologies. Terrorism uses
many techniques to carry out their actions and plans. Technology can play an important role in providing accurate predictions of terrorist activities. Here, we tried to do
so using sentiment analysis for terrorist-related of Twitter because the early detection
of terrorist activity is very important to the recent attack and to combat the spread
of global terrorist activity. This work studied the techniques of effective analysis of
terrorist activity data on Twitter. It is based on 17 articles that used Twitter to study
terrorism for different purposes, while highlighting the different techniques used,
from this survey one can notice that the machine learning techniques were used the
most for sentimental analysis with good accuracy depending on the data used such
as AdaBoost, support vector machine, maximum entropy, Naive Bayes, decision
tree algorithms. Few number of papers are analyzed tweets in Arabic language as
compared to English version because of its complexity parsing beside the complexity
in analyzing feelings in Arabic makes tasks more challenging.
Keywords Sentiment analysis · Terrorism · ISIS · Twitter · Terrorist detection
1 Introduction
Terrorism becomes a world issue, although terrorism is not modern, but terrorist
groups are still very active in different countries in the world with numerous
supporters around the world, and terrorists continue to adopt developed and lowresource tactics to launch highly violent attacks in the world and recruit people; it
was defined by FBI as “the unlawful use of force and force against people or property
E. Najjar (B) · S. Al-augby
Faculty of Computer Science and Mathematics, Department of Computer Science, University of
Kufa, Najaf, Iraq
e-mail: esraanajjar13@gmail.com
S. Al-augby
e-mail: salam.alaugby@uokufa.edu.iq
© Springer Nature Singapore Pte Ltd. 2021
R. Kumar et al. (eds.), Research in Intelligent and Computing in Engineering,
Advances in Intelligent Systems and Computing 1254,
https://doi.org/10.1007/978-981-15-7527-3_23
231
232
E. Najjar and S. Al-augby
to intimidate or compel a government, in furtherance of political or social objectives
or Sectarianism” [1]. With technology progress at present time, social media such
as Twitter are used by terrorist groups to spread their ideas, recruitment, funding,
and incitement to terrorist operation [2], and there are many groups and terror organizations, but the most prevalent one in this area is ISIS and comes with different
names such as Daesh, which is a shortcut to Arabic phrase “al-Dawla al-Islamiya
al-Iraq al-Sham” (Islamic State of Iraq and Sham) [3] and in the English language
symbolized as ISIS which means “Islamic State of Iraq and Syria” or it is called
IS “Islamic State” [4] succeeded in using this modern technology for recruitment,
message dissemination, and even plot attacks from turning traditional war into an
electronic war using social media platform for strengthening its presence in the world
[5].
In this work, we tried to examine the strategies and techniques that applied to
analyze, understand, and interpret sentiment analysis techniques methodology for
classifying tweets and we found that machine learning algorithm is the most used
and the most accurate for sentiment analysis. The benefit of this work is to survey
of the papers that assist in the detection of the terrorist or even terrorist supporter in
Twitter for closing their accounts.
2 Twitter
Twitter is considered one of the most famous social networks in the world. Twitter
offers a small blogging service that allows a customer to send “tweets” that will get
re-twitter or like by other users, instantly by Twitter or by sending a short message,
or other applications. These updates show up on the user’s Twitter pages, and the
friends can read it through their home page or visit a user’s profile [6].
Twitter presents the most popular fast growing microblog service; the number of
tweets increased from 400,000 in the second quarter of 2007 to 100 million tweets
in the first quarter of 2008 a, in February 2010; Twitter customers spread 50 million
tweets through one day [7]. As of June 2010, about 65 million tweets were published
daily, equal to about 750 tweets per second [8] and continue to increase to more than
330 million active users per month in the first quarter of 2019 on Twitter as shown
in Fig. 1. Twitter ranked second in the list of most popular social networking sites in
January 2009, after it ranked 22nd earlier [9].
The continuous growth of Twitter and download multiple updates example, Twitter
declared in 2016 that images and videos have published without affecting the 140character, not only that but the attachments and links are never again part of the
lowercase letters, made by 2017 and Twitter expanded the total of a character allowed
for tweets from 140 to 280 characters [11] that led to use of Twitter in very popular
consider most famous social network sites worldwide. Twitter is considered one
of the most proper social platforms for news, public relations, media, political and
terrorist activities, etc., and for all these reasons, terrorist organizations exploited
Twitter as a perfect platform to expand its ideas, propagate their messages, recruit
Sentiment Analysis Combination in Terrorist …
233
Fig. 1 The number of active Twitter customers per month worldwide from the first quarter of 2010
to the first quarter 2019 (in millions). Source Twitter [10]
new members, and even plot attacks. It is considered as a potential facilitator and also
can be seen as a robust deterrent effect to terrorist activities even at civil response
during the 2009 Jakarta and Mumbai terrorist attack [12].
3 Sentiment Analysis (SA)
Sentiment analysis (SA) is a process of mining the attitudes, opinions views, and
emotions from tweets or in written text. There is a difference between them where
sentiment is an opinion that explains personal feelings. Opinion is a conclusion that
opens an argument about (because different experts have different opinions), a view
is a personal opinion, and belief is deliberate approval and, ideological assent [13].
The sentiment is an opinion representing one’s feelings. Natural language
processing (NLP) statistics or automated learning techniques can be used to SA
of Twitter users, and after that classify their polarity, an advantage of sentiment
analysis mission was to find out whether the expression of opinion in the texts is
positive, negative or neutral; therefore, the purpose of SA is to find opinions, determine the sentiments they express about present local, international issues, and events
in the world, such information is actually useful to know with their opinion about
terrorism crime and other areas such as political elections, sports, trademarks, products, tourism, celebrities. [14]. The sentiment classification techniques are illustrated
in Fig. 2.
234
E. Najjar and S. Al-augby
Fig. 2 Sentiment classification techniques. Source Our work based on [15]
4 Literature Review
There are many works in using sentiment analysis for terrorist detection that uses
different techniques such as Cheong & Lee in [12], proposed a new framework
consisting of four phases (Breaking news, Data harvesting and spam filtering, Sentiment detection and demographic, and Data mining and reporting). This work used
Twitter’s micro-blogging service as a multi-faceted service. The data source for sentimental data mining (use the self-organizing map algorithm for clustering) is data from
a demographic analysis in the civic echo to terrorism via synthetic trial data; they
proved that the proposed framework resulted in significant graphs of information
scenarios to detect potential response terrorist threats and helped in understanding
the capacity of shared organizational data using the content mining is unstructured
in extracting deep knowledge of Twitter’s deep messages. The limitation of the study
is a need for real-world data for strong analysis.
Bolla in [16] examines the analysis of geographic data in tweet providing a clear
view of crime trends in various cities on 100,000 tweets were conducted using sentiment analysis techniques(ANEW-based technique and deep learning model) on these
tweets to analyze the crime intensity of a special location, and the results from this
research were positive. The advanced emotion analysis algorithm helps in distinguishing a bad killer from tweets within a particular site and can be more accurate in
its application to other media, e.g., Facebook, Google+, Tumblr, and MySpace. The
results of this method helped to detect crime patterns, but sentiment analysis techniques did not ensure appropriate results every time; the researchers should consider
how to improve them.
In Omer’s work [17], three types of datasets were collected, one supports of ISIS
(TW-PRO), one anti-ISIS from accounts (TW-CON), and another on random tweets
Sentiment Analysis Combination in Terrorist …
235
that have no relationship to ISIS (TW-RAND). The number of collected tweets
is 135,608, and three types of the feature were used: stylometric, time-based, and
sentiment-based feature. The number of features is 619 after the feature selection
process. In this study, a method has given to classify tweets as radical or not by
applying support vector machine (SVM), Naive Bayes (NB), and AdaBoost. The
results differed depending on the used datasets, when the algorithms were applied to
TW-PRO data and TW-RAND that gave best results from TW-PRO and TW-CON
when they applied. The final results after using all the datasets were slightly better
in accuracy when AdaBoost was applied with 100% correctly classified instances,
while NB was 99.9% and SVM was 99.1%.
Kaati et al. in their work [18] applied AdaBoost which is a machine learning technique and used two sets of features, data-dependent and data independent features
to classify Arabic and English dataset with tweeps and tweets of jihadism collected
dataset in two various method, posting on the (Shumukh Al-Islam), and also the
collected tweets include hashtags related to jihadists and especially ISIS; the English
tweeps and English tweets are well classified, and the classification result is better
than the Arabic data classification. In English tweets, the final results had high accuracy, precision, and recall ratios. The accuracy for data dependent was 0.9907, and
data independent feature accuracy was 0.9882, 0.9951 for both set of features, while
in Arabic, the results were not performed well in accuracy, precision, and recall and
the accuracy of classifying tweets was 0.824 of data independent, while 0.8466 to
dependent data and 0.8638 to both set of features.
Magdy et al. [19] used a Twitter dataset to review the precedents of ISIS supporters
at a global and timeline, by collecting 3.1 million Arabic tweets that indicate for ISIS,
classifying them into pro-ISIS and anti-ISIS. This classification simply depended on
using the full name as (Islamic State) of groups as a strong indicator for support,
while abbreviations as (ISIS) usually indicate for the opposition and make to predict
futurity support or opposition of ISIS. The accuracy was 87%, and the training
of a SVM classifier was done by using the SVMLight application with a linear
kernel and default parameters, while for features they used a bag of words features
including individual terms, hashtags, user mention, and Geographic Distribution. The
researchers concluded that the source of support for ISIS disappointments results
from the Arab Spring, as for dissent to ISIS and it is connected with support for
insurgent teams.
Ngoge [20], in his study, worked on four objectives. First, determining the current
terrorism, the second objective was to identify terrorist activities, the third objective
was to study data mining techniques (Naïve Bayes classifier, maximum entropy,
SVM, Lexicon-based approaches) used in crime detection, Fourth to develop the
ultimate goal that is to test the system. The accuracy level of this model was 73%,
and the recall and precision rates of average at which positive text are predicted were
(15%) and (60%), respectively, of 346 tweets related to terrorism for a period of
seven days only in Kenya. The distribution of sentiments on the map as indicated
by markers represented the patterns and trends of terrorist activities in Kenya; these
indications can be used by law enforcement officers to give them investigative leads
236
E. Najjar and S. Al-augby
and information that will help them in disrupting, exposing, and uncovering terrorists’
networks and their structure effectively.
Mutlu et al. [21] applied K-nearest neighbor (KNN), Naive Bayes, and C4.5
decision tree algorithms on a dataset collected using the Twitter REST API that
were 95.578 belonging to 3.321 users, with used Hadoop/Mahout and Hadoop/Hive
platforms for big data processing. Weka4 tool was used for evaluating the proposed
model with 89% accuracy for C4.5 algorithms, while the success ratio for Naive
Bayes was 79% and for KNN was 83% accuracy. This study discovered that the users
malicious are “trolls” who created problems on a sensitive issue like terrorism and
elimination from them, the best performance of trolls detection with C4.5 algorithms.
Ali [22] in his study used data mining tools to extract and determine terrorist
organizing vocabulary by the analysis of tweets collected from Twitter API related
to ISIS on Arabic and English languages. K-means algorithm was used to cluster
collected tweets for finding repetitive words in tweets by count word frequency, and
word clouds were helped to guess the number of clusters and confirming or smoothing
guesses with the Elbow method. The next step was to implement K-means to specify
the specific user accounts who frequently used the selected terms, and then the user
accounts were verified using network graphs that created by NodeXL and Gephi,
which draws the user’s network as the last step for its easy-to-use visualization and
free; these tools if fully implemented can be used to enhance law enforcement efforts
to a large extent in the elimination of terrorist groups showed in this study.
Mirani and Sasi [23] used a hybrid method by combining data mining algorithms
and geolocation and presented a novel way in their paper “Sentiment analysis of ISISrelated Tweets using Absolute location (SITA)” for classifying ISIS-related Tweets as
polarity-based classification. Using sentiment analysis, this work was done by using
“Jeffrey Breen” algorithm. They made a comparison of hashtags (#ISLAMICSTATE,
#ISIS, #ISIL, #DAESH, #IS). The performance of algorithms was estimated by
accuracy, F measure, Recall, Precision. Accuracy result in data mining algorithm
(support vector machine, maximum entropy, random forest, bagging and decision
trees) that was used with #ISIL is the highest accuracy and then geolocation. For
each algorithm executed, the average accuracy was more than 90% after tenfold
cross-effectiveness. The best result was 99% of the maximum entropy accuracy after
validation by #isil.
Ferrara et al. [24] in their work stated that the selected dataset used in this system
was chosen from more than 25 thousand users with millions of tweets, where the users
manually identified. The used machine learning algorithm was (logistic regression
and random forests) with most important features used in three tasks: the radical
user identification, prediction of radical content adoption, and the prediction of how
the radical users and uniform users interacted. The most significant result achieved
after using different scenarios was for the radical users was more than 93% is best
performance in AUC (Area under the ROC Curve), for the detection of a radical user
was above 80% AUC, and for the interaction between these types of users was more
than 72% AUC.
Azrina Azizan and Abdul Aziz [25] conducted a comparative study between sentiment analysis techniques according to their results. In this study, machine learning
Sentiment Analysis Combination in Terrorist …
237
had better accuracy than a lexicon-based approach for their proposed system in order
to improve the sentiment analysis. This improvement was done by selecting the Naive
Bayes technique to make improvements in the accuracy of the terrorism detection
process. One of a strong point of this classification is that there is a reclassification
for the sentences after the classification into three categories: positive, negative and
impartial classifications; a comparison between the new categories of tweets and the
previous ones for one specific account holder was held depending on the sentiment
score for the last and previous sentence.
Wang et al. [26] in their research stated that machine learning techniques (support
vector machine(SVM), maximum entropy (ME), tree, bagged tree, boosted trees,
random forest (RF), neural network (NN), and Naive Bayes (NB)) are used with unigram, bi-gram, and trigram feature. These features are used for classifying a dataset
more than 700,000 tweets purchased from a third party (GNIP) which is authorized
to resell historical tweets. These data are gained by some specific rules based on
filters based on the time-period, keywords, and geographic location. GunsOnTwitter
website presents a tool that can be used for visualizing different time scales and by
a state of public sentiment interactively. One of the main findings of this paper is
that the machine learning method can be used to analyze a large dataset of tweets
in an effective way for sentiment analysis. Based on the given results, RF method
gave the highest accuracy with 92%. At the second, rank was 88.5% for bagged tree
and boosted tree with almost the same accuracy rate. SVM and n-gram models gave
85% accuracy. Using more N-gram did not give better features for this method, and
hence, the performance is still the same.
The researchers, Ahmed and Qadoos [27], collected data from Twitter API, and
they used in the first step the SentiWordNet and the text blob library but are not
accurate; for this, machine learning algorithm was used with accuracy up to 83%
for random forest and multinomial Naive Bayes classifier gives accuracy around
between 85 and 89% for detecting terrorism.
The main reason for the study of Ruhrberg et al. [28] is to identify the possible
differences between countries according to the emotions of ISIS-related tweets; they
used sentiment analysis of Twitter tweets on the subject of the Islamic State based on
their emotions if it is positive or negative to it, for 500,000 from Twitter streaming
API. These tweets used sentiment analysis with Affine Lexicon and SentiStrength
method. Political systems, geographic location, and distance toward the area where
the Islamic State is effective and terroristic attacks are the factors that affect the SA.
The most important results described by this paper is that a negative attitude toward
Daesh is the most and in smaller grade are positive or neutral sentiment from some
specific countries.
Smedt et al. [29] used a method to detect jihadist hatred online speak automatically
using NLP and machine learning (ML) techniques on data collected from October
2014 to December 2016. These data were trained by using the LIBSVM machine
learning algorithm with balanced training. The training phase was conducted on a
corpus of 45,000 subversive tweets. The accuracy varied according to the language
used, e.g., 84% for Arabic, 79% for English, 80% for French, Farsi is 80%, and
Portuguese is 81% for all accuracy is 82% (F1-score). This study helps to shed the
238
E. Najjar and S. Al-augby
lights on the online hate message which is a relatively new event that needs to be
studied, especially with yet no fine regulations or technological answers for it.
Hernandez-Suarez et al. 2018 [30] in their a work “Social Sentiment Sensor in
Twitter for Predicting Cyber-Attacks Using 1Regularization” used a social sentiment sensor in Twitter to predict cyber-attacks. They collected 1,800,000 tweets
in English and finally classified these tweets as negative, positive, and securityoriented by conducting an approach for historical retrieval by querying Twitter search
endpoints. For the classifiers (Naive Bayes, maximum entropy and support vector
machines), 1 regularization is used. The proposed strategy could task for warning
potential cyber-attacks.
Alkhalisy and Jehlol [31] collected about 10,322 Tweets about terrorism keyword
to perform cleaning and preprocessing on it and convert these data after that to the
corpus. In this study, two tasks are used; the first task was analyzing Twitter data and
sentiment mapping with JeoJSON, which is used to locate terrorists. The word list
of synonyms and antonyms associated with terrorism is obtained from the dictionary
and then classified as positive and negative. The suggested methods are based on the
word bag feature at the expense of the meta-points of each text by calculating the sum
of the word points in the tweets representing the training data. The workbook used
Naive Bayes to avoid placing the word in the document. Depending on the training
data, Naive Bayes ratings rated each Twitter as positive, negative, and natural, and
the number of classified tweets as negative is 7122.
5 Conclusion
Terrorism has no borders and not limited to one country. Terrorism exists everywhere
and is executed by people from various regions who speak different languages, and
they are from different religions. The purpose of this study was to show the methods
used that can give us better understanding with the help of SA for data from Twitter
using ML algorithms (Bayesian Networks, Naive Bayes Classification, maximum
entropy, neural networks, support vector machine) and lexical method (Dictionarybased approach, Corpus-based approach) to determine the approach with the highest
accuracy rate of learning. Main findings are the accuracy of the results that depends on
size data, type language of data, type feature used, and author factor as shown in Fig. 1
that clears up the reviewed study with an explanation of the algorithms used in the
dataset sentiment classifiers, conclusions, strength, and weaknesses. One can notice
that most researchers used ML approaches for getting better results as compared with
lexicon-based approaches; the most algorithms used support vector machine and
Naive Bayes; the highest accuracy was with using AdaBoost and maximum entropy.
The weakness of the collected works from our viewpoint can be summarized as if
the training set is very small or not represented precisely or using noisy data or using
irrelevant features besides, the model cannot be very simple in which the system will
underfit and neither very complex in which case it suffers strongly overfitting, for all
Sentiment Analysis Combination in Terrorist …
239
the above there is a trail to propose a hybrid method between machine learning and
lexicon base or another hybrid method that can give better results.
References
1. Best S, Nocella A, Anthony J (2004) Defining terrorism. Anim Lib Philos Policy J 2(1):1–18
2. Bodine-baron E, Helmus TC, Magnuson M, Winkelman Z (2016) Examining ISIS support and
opposition networks on Twitter
3. Oakley N, Chakrabati S (2015) What does Daesh mean? ISIS ‘threatens to cut out the tongues’
of anyone using this word. Mirror 2 (2015)
4. Wood G (2015) What ISIS really wants. The Atlantic 315(2):78–94
5. Badawy A, Ferrara E (2018) The rise of jihadist propaganda on social networks. J Comput Soc
Sci 1(2):453–470
6. Java A, Song X, Finin T, Tseng B (2007) Why we twitter: understanding microblogging usage
and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on
web mining and social network analysis, pp 56–65
7. Beaumont C (2011) Twitter users send 50 million tweets per day–almost 600 tweets are sent
every second through the microblogging site, according to its own metrics. Dly Telegr (London)
7
8. Garrett S (2010) Big goals, big game, big records. Twitter Blog
(http//blog.twitter.com/2010/06/biggoals-big-game-big-records.html)
9. Lunden I (2012) Analyst: Twitter passed 500 m users in June 2012, 140 m of them in US;
Jakarta ‘Biggest Tweeting’ city. TechCrunch RSS 30
10. I. Twitter (2018) Number of monthly active Twitter users worldwide from 1st quarter 2010 to
4th quarter 2017 (in millions). Technical report, Statista
11. Rosen A (2017) Tweeting made easier. Nov 2017
12. Cheong M, Lee VCS (2011) A microblogging-based approach to terrorism informatics: exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Inf Syst
Front 13(1):45–59
13. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
14. Liu B (2015) Opinions, sentiment, and emotion in text. Cambridge University Press, Cambridge
15. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a
survey. Ain Shams Eng. J. 5(4):1093–1113
16. Bolla R (2014) Crime pattern detection using online social media. Masters theses
17. Omer E (2015) Using machine learning to identify jihadist messages on Twitter. Examensarbete
18. Kaati L, Omer E, Prucha N, Shrestha A (2016) Detecting multipliers of Jihadism on Twitter.
In: Proceedings of 15th IEEE international conferene on data minining work (ICDMW 2015),
pp 954–960
19. Magdy W, Darwish k, Weber I (2016) #FailedRevolutions: using Twitter to study the
antecedents of ISIS support. First Monday 21(2)
20. Ngoge LA (2016) Real–time sentiment analysis for detection of terrorist activities in Kenya.
Strathmore University
21. Mutlu B, Mutlu M, Oztoprak K, Dogdu E (2016) Identifying trolls and determining terror
awareness level in social networks using a scalable framework. In: Proceedings of 2016 IEEE
international conference on big data, pp 1792–1798
22. Ali GA (2016) Identifying terrorist affiliations through social network analysis using data
mining techniques
23. Mirani TB, Sasi S (2016) Sentiment analysis of ISIS related Tweets using absolute location.
In: 2016 international conference on computational science and computational intelligence
(CSCI), pp 1140–1145
240
E. Najjar and S. Al-augby
24. Ferrara E, Wang WQ, Varol O, Flammini A, Galstyan A (2016) Predicting online extremism,
content adopters, and interaction reciprocity. Lecturer notes in computer science (including
subseries in lecture notes in artificial intelligence and lecturer notes in bioinformatics), vol
10047 LNCS, pp 22–39
25. Azrina Azizan S, Abdul Aziz I (2017) Terrorism detection based on sentiment analysis using
machine learning. J Eng Appl Sci 12(3):691–698
26. Wang N, Varghese B, Donnelly PD (2017) A machine learning analysis of Twitter sentiment
to the Sandy Hook shootings. In: Proceedings of 2016 IEEE 12th International Conference on
e-Science, pp 303–312
27. Ahmed S, Qadoos M. Terrorism detection by tweet sentimental analysis
28. Ruhrberg SD, Kirstein G, Habermann T, Nikolic J, Stock WG (2018) #ISIS—a comparative
analysis of country-specific sentiment on Twitter. Open J Soc Sci 06(06):142–158
29. De Smedt T, De Pauw G, Van Ostaeyen P (2018) Automatic detection of online jihadist hate
speech, Feb 2018
30. Hernandez-Suarez A et al (2018 )Social sentiment sensor in Twitter for predicting cyber-attacks
using 1 Regularization. Sensors (Basel) 18(5)
31. Alkhalisy M, Jehlol H (2018) Terrorist affiliations identifying through Twitter social media
analysis using data mining and web mapping techniques, vol 13
Download