Sentiment Analysis Combination in Terrorist Detection on Twitter: A Brief Survey of Approaches and Techniques Esraa Najjar and Salam Al-augby Abstract Terrorism is a big concern for many governments and people, especially with using social media such as Twitter that uses new technologies. Terrorism uses many techniques to carry out their actions and plans. Technology can play an important role in providing accurate predictions of terrorist activities. Here, we tried to do so using sentiment analysis for terrorist-related of Twitter because the early detection of terrorist activity is very important to the recent attack and to combat the spread of global terrorist activity. This work studied the techniques of effective analysis of terrorist activity data on Twitter. It is based on 17 articles that used Twitter to study terrorism for different purposes, while highlighting the different techniques used, from this survey one can notice that the machine learning techniques were used the most for sentimental analysis with good accuracy depending on the data used such as AdaBoost, support vector machine, maximum entropy, Naive Bayes, decision tree algorithms. Few number of papers are analyzed tweets in Arabic language as compared to English version because of its complexity parsing beside the complexity in analyzing feelings in Arabic makes tasks more challenging. Keywords Sentiment analysis · Terrorism · ISIS · Twitter · Terrorist detection 1 Introduction Terrorism becomes a world issue, although terrorism is not modern, but terrorist groups are still very active in different countries in the world with numerous supporters around the world, and terrorists continue to adopt developed and lowresource tactics to launch highly violent attacks in the world and recruit people; it was defined by FBI as “the unlawful use of force and force against people or property E. Najjar (B) · S. Al-augby Faculty of Computer Science and Mathematics, Department of Computer Science, University of Kufa, Najaf, Iraq e-mail: esraanajjar13@gmail.com S. Al-augby e-mail: salam.alaugby@uokufa.edu.iq © Springer Nature Singapore Pte Ltd. 2021 R. Kumar et al. (eds.), Research in Intelligent and Computing in Engineering, Advances in Intelligent Systems and Computing 1254, https://doi.org/10.1007/978-981-15-7527-3_23 231 232 E. Najjar and S. Al-augby to intimidate or compel a government, in furtherance of political or social objectives or Sectarianism” [1]. With technology progress at present time, social media such as Twitter are used by terrorist groups to spread their ideas, recruitment, funding, and incitement to terrorist operation [2], and there are many groups and terror organizations, but the most prevalent one in this area is ISIS and comes with different names such as Daesh, which is a shortcut to Arabic phrase “al-Dawla al-Islamiya al-Iraq al-Sham” (Islamic State of Iraq and Sham) [3] and in the English language symbolized as ISIS which means “Islamic State of Iraq and Syria” or it is called IS “Islamic State” [4] succeeded in using this modern technology for recruitment, message dissemination, and even plot attacks from turning traditional war into an electronic war using social media platform for strengthening its presence in the world [5]. In this work, we tried to examine the strategies and techniques that applied to analyze, understand, and interpret sentiment analysis techniques methodology for classifying tweets and we found that machine learning algorithm is the most used and the most accurate for sentiment analysis. The benefit of this work is to survey of the papers that assist in the detection of the terrorist or even terrorist supporter in Twitter for closing their accounts. 2 Twitter Twitter is considered one of the most famous social networks in the world. Twitter offers a small blogging service that allows a customer to send “tweets” that will get re-twitter or like by other users, instantly by Twitter or by sending a short message, or other applications. These updates show up on the user’s Twitter pages, and the friends can read it through their home page or visit a user’s profile [6]. Twitter presents the most popular fast growing microblog service; the number of tweets increased from 400,000 in the second quarter of 2007 to 100 million tweets in the first quarter of 2008 a, in February 2010; Twitter customers spread 50 million tweets through one day [7]. As of June 2010, about 65 million tweets were published daily, equal to about 750 tweets per second [8] and continue to increase to more than 330 million active users per month in the first quarter of 2019 on Twitter as shown in Fig. 1. Twitter ranked second in the list of most popular social networking sites in January 2009, after it ranked 22nd earlier [9]. The continuous growth of Twitter and download multiple updates example, Twitter declared in 2016 that images and videos have published without affecting the 140character, not only that but the attachments and links are never again part of the lowercase letters, made by 2017 and Twitter expanded the total of a character allowed for tweets from 140 to 280 characters [11] that led to use of Twitter in very popular consider most famous social network sites worldwide. Twitter is considered one of the most proper social platforms for news, public relations, media, political and terrorist activities, etc., and for all these reasons, terrorist organizations exploited Twitter as a perfect platform to expand its ideas, propagate their messages, recruit Sentiment Analysis Combination in Terrorist … 233 Fig. 1 The number of active Twitter customers per month worldwide from the first quarter of 2010 to the first quarter 2019 (in millions). Source Twitter [10] new members, and even plot attacks. It is considered as a potential facilitator and also can be seen as a robust deterrent effect to terrorist activities even at civil response during the 2009 Jakarta and Mumbai terrorist attack [12]. 3 Sentiment Analysis (SA) Sentiment analysis (SA) is a process of mining the attitudes, opinions views, and emotions from tweets or in written text. There is a difference between them where sentiment is an opinion that explains personal feelings. Opinion is a conclusion that opens an argument about (because different experts have different opinions), a view is a personal opinion, and belief is deliberate approval and, ideological assent [13]. The sentiment is an opinion representing one’s feelings. Natural language processing (NLP) statistics or automated learning techniques can be used to SA of Twitter users, and after that classify their polarity, an advantage of sentiment analysis mission was to find out whether the expression of opinion in the texts is positive, negative or neutral; therefore, the purpose of SA is to find opinions, determine the sentiments they express about present local, international issues, and events in the world, such information is actually useful to know with their opinion about terrorism crime and other areas such as political elections, sports, trademarks, products, tourism, celebrities. [14]. The sentiment classification techniques are illustrated in Fig. 2. 234 E. Najjar and S. Al-augby Fig. 2 Sentiment classification techniques. Source Our work based on [15] 4 Literature Review There are many works in using sentiment analysis for terrorist detection that uses different techniques such as Cheong & Lee in [12], proposed a new framework consisting of four phases (Breaking news, Data harvesting and spam filtering, Sentiment detection and demographic, and Data mining and reporting). This work used Twitter’s micro-blogging service as a multi-faceted service. The data source for sentimental data mining (use the self-organizing map algorithm for clustering) is data from a demographic analysis in the civic echo to terrorism via synthetic trial data; they proved that the proposed framework resulted in significant graphs of information scenarios to detect potential response terrorist threats and helped in understanding the capacity of shared organizational data using the content mining is unstructured in extracting deep knowledge of Twitter’s deep messages. The limitation of the study is a need for real-world data for strong analysis. Bolla in [16] examines the analysis of geographic data in tweet providing a clear view of crime trends in various cities on 100,000 tweets were conducted using sentiment analysis techniques(ANEW-based technique and deep learning model) on these tweets to analyze the crime intensity of a special location, and the results from this research were positive. The advanced emotion analysis algorithm helps in distinguishing a bad killer from tweets within a particular site and can be more accurate in its application to other media, e.g., Facebook, Google+, Tumblr, and MySpace. The results of this method helped to detect crime patterns, but sentiment analysis techniques did not ensure appropriate results every time; the researchers should consider how to improve them. In Omer’s work [17], three types of datasets were collected, one supports of ISIS (TW-PRO), one anti-ISIS from accounts (TW-CON), and another on random tweets Sentiment Analysis Combination in Terrorist … 235 that have no relationship to ISIS (TW-RAND). The number of collected tweets is 135,608, and three types of the feature were used: stylometric, time-based, and sentiment-based feature. The number of features is 619 after the feature selection process. In this study, a method has given to classify tweets as radical or not by applying support vector machine (SVM), Naive Bayes (NB), and AdaBoost. The results differed depending on the used datasets, when the algorithms were applied to TW-PRO data and TW-RAND that gave best results from TW-PRO and TW-CON when they applied. The final results after using all the datasets were slightly better in accuracy when AdaBoost was applied with 100% correctly classified instances, while NB was 99.9% and SVM was 99.1%. Kaati et al. in their work [18] applied AdaBoost which is a machine learning technique and used two sets of features, data-dependent and data independent features to classify Arabic and English dataset with tweeps and tweets of jihadism collected dataset in two various method, posting on the (Shumukh Al-Islam), and also the collected tweets include hashtags related to jihadists and especially ISIS; the English tweeps and English tweets are well classified, and the classification result is better than the Arabic data classification. In English tweets, the final results had high accuracy, precision, and recall ratios. The accuracy for data dependent was 0.9907, and data independent feature accuracy was 0.9882, 0.9951 for both set of features, while in Arabic, the results were not performed well in accuracy, precision, and recall and the accuracy of classifying tweets was 0.824 of data independent, while 0.8466 to dependent data and 0.8638 to both set of features. Magdy et al. [19] used a Twitter dataset to review the precedents of ISIS supporters at a global and timeline, by collecting 3.1 million Arabic tweets that indicate for ISIS, classifying them into pro-ISIS and anti-ISIS. This classification simply depended on using the full name as (Islamic State) of groups as a strong indicator for support, while abbreviations as (ISIS) usually indicate for the opposition and make to predict futurity support or opposition of ISIS. The accuracy was 87%, and the training of a SVM classifier was done by using the SVMLight application with a linear kernel and default parameters, while for features they used a bag of words features including individual terms, hashtags, user mention, and Geographic Distribution. The researchers concluded that the source of support for ISIS disappointments results from the Arab Spring, as for dissent to ISIS and it is connected with support for insurgent teams. Ngoge [20], in his study, worked on four objectives. First, determining the current terrorism, the second objective was to identify terrorist activities, the third objective was to study data mining techniques (Naïve Bayes classifier, maximum entropy, SVM, Lexicon-based approaches) used in crime detection, Fourth to develop the ultimate goal that is to test the system. The accuracy level of this model was 73%, and the recall and precision rates of average at which positive text are predicted were (15%) and (60%), respectively, of 346 tweets related to terrorism for a period of seven days only in Kenya. The distribution of sentiments on the map as indicated by markers represented the patterns and trends of terrorist activities in Kenya; these indications can be used by law enforcement officers to give them investigative leads 236 E. Najjar and S. Al-augby and information that will help them in disrupting, exposing, and uncovering terrorists’ networks and their structure effectively. Mutlu et al. [21] applied K-nearest neighbor (KNN), Naive Bayes, and C4.5 decision tree algorithms on a dataset collected using the Twitter REST API that were 95.578 belonging to 3.321 users, with used Hadoop/Mahout and Hadoop/Hive platforms for big data processing. Weka4 tool was used for evaluating the proposed model with 89% accuracy for C4.5 algorithms, while the success ratio for Naive Bayes was 79% and for KNN was 83% accuracy. This study discovered that the users malicious are “trolls” who created problems on a sensitive issue like terrorism and elimination from them, the best performance of trolls detection with C4.5 algorithms. Ali [22] in his study used data mining tools to extract and determine terrorist organizing vocabulary by the analysis of tweets collected from Twitter API related to ISIS on Arabic and English languages. K-means algorithm was used to cluster collected tweets for finding repetitive words in tweets by count word frequency, and word clouds were helped to guess the number of clusters and confirming or smoothing guesses with the Elbow method. The next step was to implement K-means to specify the specific user accounts who frequently used the selected terms, and then the user accounts were verified using network graphs that created by NodeXL and Gephi, which draws the user’s network as the last step for its easy-to-use visualization and free; these tools if fully implemented can be used to enhance law enforcement efforts to a large extent in the elimination of terrorist groups showed in this study. Mirani and Sasi [23] used a hybrid method by combining data mining algorithms and geolocation and presented a novel way in their paper “Sentiment analysis of ISISrelated Tweets using Absolute location (SITA)” for classifying ISIS-related Tweets as polarity-based classification. Using sentiment analysis, this work was done by using “Jeffrey Breen” algorithm. They made a comparison of hashtags (#ISLAMICSTATE, #ISIS, #ISIL, #DAESH, #IS). The performance of algorithms was estimated by accuracy, F measure, Recall, Precision. Accuracy result in data mining algorithm (support vector machine, maximum entropy, random forest, bagging and decision trees) that was used with #ISIL is the highest accuracy and then geolocation. For each algorithm executed, the average accuracy was more than 90% after tenfold cross-effectiveness. The best result was 99% of the maximum entropy accuracy after validation by #isil. Ferrara et al. [24] in their work stated that the selected dataset used in this system was chosen from more than 25 thousand users with millions of tweets, where the users manually identified. The used machine learning algorithm was (logistic regression and random forests) with most important features used in three tasks: the radical user identification, prediction of radical content adoption, and the prediction of how the radical users and uniform users interacted. The most significant result achieved after using different scenarios was for the radical users was more than 93% is best performance in AUC (Area under the ROC Curve), for the detection of a radical user was above 80% AUC, and for the interaction between these types of users was more than 72% AUC. Azrina Azizan and Abdul Aziz [25] conducted a comparative study between sentiment analysis techniques according to their results. In this study, machine learning Sentiment Analysis Combination in Terrorist … 237 had better accuracy than a lexicon-based approach for their proposed system in order to improve the sentiment analysis. This improvement was done by selecting the Naive Bayes technique to make improvements in the accuracy of the terrorism detection process. One of a strong point of this classification is that there is a reclassification for the sentences after the classification into three categories: positive, negative and impartial classifications; a comparison between the new categories of tweets and the previous ones for one specific account holder was held depending on the sentiment score for the last and previous sentence. Wang et al. [26] in their research stated that machine learning techniques (support vector machine(SVM), maximum entropy (ME), tree, bagged tree, boosted trees, random forest (RF), neural network (NN), and Naive Bayes (NB)) are used with unigram, bi-gram, and trigram feature. These features are used for classifying a dataset more than 700,000 tweets purchased from a third party (GNIP) which is authorized to resell historical tweets. These data are gained by some specific rules based on filters based on the time-period, keywords, and geographic location. GunsOnTwitter website presents a tool that can be used for visualizing different time scales and by a state of public sentiment interactively. One of the main findings of this paper is that the machine learning method can be used to analyze a large dataset of tweets in an effective way for sentiment analysis. Based on the given results, RF method gave the highest accuracy with 92%. At the second, rank was 88.5% for bagged tree and boosted tree with almost the same accuracy rate. SVM and n-gram models gave 85% accuracy. Using more N-gram did not give better features for this method, and hence, the performance is still the same. The researchers, Ahmed and Qadoos [27], collected data from Twitter API, and they used in the first step the SentiWordNet and the text blob library but are not accurate; for this, machine learning algorithm was used with accuracy up to 83% for random forest and multinomial Naive Bayes classifier gives accuracy around between 85 and 89% for detecting terrorism. The main reason for the study of Ruhrberg et al. [28] is to identify the possible differences between countries according to the emotions of ISIS-related tweets; they used sentiment analysis of Twitter tweets on the subject of the Islamic State based on their emotions if it is positive or negative to it, for 500,000 from Twitter streaming API. These tweets used sentiment analysis with Affine Lexicon and SentiStrength method. Political systems, geographic location, and distance toward the area where the Islamic State is effective and terroristic attacks are the factors that affect the SA. The most important results described by this paper is that a negative attitude toward Daesh is the most and in smaller grade are positive or neutral sentiment from some specific countries. Smedt et al. [29] used a method to detect jihadist hatred online speak automatically using NLP and machine learning (ML) techniques on data collected from October 2014 to December 2016. These data were trained by using the LIBSVM machine learning algorithm with balanced training. The training phase was conducted on a corpus of 45,000 subversive tweets. The accuracy varied according to the language used, e.g., 84% for Arabic, 79% for English, 80% for French, Farsi is 80%, and Portuguese is 81% for all accuracy is 82% (F1-score). This study helps to shed the 238 E. Najjar and S. Al-augby lights on the online hate message which is a relatively new event that needs to be studied, especially with yet no fine regulations or technological answers for it. Hernandez-Suarez et al. 2018 [30] in their a work “Social Sentiment Sensor in Twitter for Predicting Cyber-Attacks Using 1Regularization” used a social sentiment sensor in Twitter to predict cyber-attacks. They collected 1,800,000 tweets in English and finally classified these tweets as negative, positive, and securityoriented by conducting an approach for historical retrieval by querying Twitter search endpoints. For the classifiers (Naive Bayes, maximum entropy and support vector machines), 1 regularization is used. The proposed strategy could task for warning potential cyber-attacks. Alkhalisy and Jehlol [31] collected about 10,322 Tweets about terrorism keyword to perform cleaning and preprocessing on it and convert these data after that to the corpus. In this study, two tasks are used; the first task was analyzing Twitter data and sentiment mapping with JeoJSON, which is used to locate terrorists. The word list of synonyms and antonyms associated with terrorism is obtained from the dictionary and then classified as positive and negative. The suggested methods are based on the word bag feature at the expense of the meta-points of each text by calculating the sum of the word points in the tweets representing the training data. The workbook used Naive Bayes to avoid placing the word in the document. Depending on the training data, Naive Bayes ratings rated each Twitter as positive, negative, and natural, and the number of classified tweets as negative is 7122. 5 Conclusion Terrorism has no borders and not limited to one country. Terrorism exists everywhere and is executed by people from various regions who speak different languages, and they are from different religions. The purpose of this study was to show the methods used that can give us better understanding with the help of SA for data from Twitter using ML algorithms (Bayesian Networks, Naive Bayes Classification, maximum entropy, neural networks, support vector machine) and lexical method (Dictionarybased approach, Corpus-based approach) to determine the approach with the highest accuracy rate of learning. Main findings are the accuracy of the results that depends on size data, type language of data, type feature used, and author factor as shown in Fig. 1 that clears up the reviewed study with an explanation of the algorithms used in the dataset sentiment classifiers, conclusions, strength, and weaknesses. One can notice that most researchers used ML approaches for getting better results as compared with lexicon-based approaches; the most algorithms used support vector machine and Naive Bayes; the highest accuracy was with using AdaBoost and maximum entropy. The weakness of the collected works from our viewpoint can be summarized as if the training set is very small or not represented precisely or using noisy data or using irrelevant features besides, the model cannot be very simple in which the system will underfit and neither very complex in which case it suffers strongly overfitting, for all Sentiment Analysis Combination in Terrorist … 239 the above there is a trail to propose a hybrid method between machine learning and lexicon base or another hybrid method that can give better results. References 1. Best S, Nocella A, Anthony J (2004) Defining terrorism. Anim Lib Philos Policy J 2(1):1–18 2. Bodine-baron E, Helmus TC, Magnuson M, Winkelman Z (2016) Examining ISIS support and opposition networks on Twitter 3. Oakley N, Chakrabati S (2015) What does Daesh mean? ISIS ‘threatens to cut out the tongues’ of anyone using this word. Mirror 2 (2015) 4. Wood G (2015) What ISIS really wants. The Atlantic 315(2):78–94 5. Badawy A, Ferrara E (2018) The rise of jihadist propaganda on social networks. J Comput Soc Sci 1(2):453–470 6. Java A, Song X, Finin T, Tseng B (2007) Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on web mining and social network analysis, pp 56–65 7. Beaumont C (2011) Twitter users send 50 million tweets per day–almost 600 tweets are sent every second through the microblogging site, according to its own metrics. Dly Telegr (London) 7 8. Garrett S (2010) Big goals, big game, big records. Twitter Blog (http//blog.twitter.com/2010/06/biggoals-big-game-big-records.html) 9. Lunden I (2012) Analyst: Twitter passed 500 m users in June 2012, 140 m of them in US; Jakarta ‘Biggest Tweeting’ city. TechCrunch RSS 30 10. I. Twitter (2018) Number of monthly active Twitter users worldwide from 1st quarter 2010 to 4th quarter 2017 (in millions). Technical report, Statista 11. Rosen A (2017) Tweeting made easier. Nov 2017 12. Cheong M, Lee VCS (2011) A microblogging-based approach to terrorism informatics: exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Inf Syst Front 13(1):45–59 13. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167 14. Liu B (2015) Opinions, sentiment, and emotion in text. Cambridge University Press, Cambridge 15. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4):1093–1113 16. Bolla R (2014) Crime pattern detection using online social media. Masters theses 17. Omer E (2015) Using machine learning to identify jihadist messages on Twitter. Examensarbete 18. Kaati L, Omer E, Prucha N, Shrestha A (2016) Detecting multipliers of Jihadism on Twitter. In: Proceedings of 15th IEEE international conferene on data minining work (ICDMW 2015), pp 954–960 19. Magdy W, Darwish k, Weber I (2016) #FailedRevolutions: using Twitter to study the antecedents of ISIS support. First Monday 21(2) 20. Ngoge LA (2016) Real–time sentiment analysis for detection of terrorist activities in Kenya. Strathmore University 21. Mutlu B, Mutlu M, Oztoprak K, Dogdu E (2016) Identifying trolls and determining terror awareness level in social networks using a scalable framework. In: Proceedings of 2016 IEEE international conference on big data, pp 1792–1798 22. Ali GA (2016) Identifying terrorist affiliations through social network analysis using data mining techniques 23. Mirani TB, Sasi S (2016) Sentiment analysis of ISIS related Tweets using absolute location. In: 2016 international conference on computational science and computational intelligence (CSCI), pp 1140–1145 240 E. Najjar and S. Al-augby 24. Ferrara E, Wang WQ, Varol O, Flammini A, Galstyan A (2016) Predicting online extremism, content adopters, and interaction reciprocity. Lecturer notes in computer science (including subseries in lecture notes in artificial intelligence and lecturer notes in bioinformatics), vol 10047 LNCS, pp 22–39 25. Azrina Azizan S, Abdul Aziz I (2017) Terrorism detection based on sentiment analysis using machine learning. J Eng Appl Sci 12(3):691–698 26. Wang N, Varghese B, Donnelly PD (2017) A machine learning analysis of Twitter sentiment to the Sandy Hook shootings. In: Proceedings of 2016 IEEE 12th International Conference on e-Science, pp 303–312 27. Ahmed S, Qadoos M. Terrorism detection by tweet sentimental analysis 28. Ruhrberg SD, Kirstein G, Habermann T, Nikolic J, Stock WG (2018) #ISIS—a comparative analysis of country-specific sentiment on Twitter. Open J Soc Sci 06(06):142–158 29. De Smedt T, De Pauw G, Van Ostaeyen P (2018) Automatic detection of online jihadist hate speech, Feb 2018 30. Hernandez-Suarez A et al (2018 )Social sentiment sensor in Twitter for predicting cyber-attacks using 1 Regularization. Sensors (Basel) 18(5) 31. Alkhalisy M, Jehlol H (2018) Terrorist affiliations identifying through Twitter social media analysis using data mining and web mapping techniques, vol 13