See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/306017370 Temporal Analysis of Radical Dark Web Forum Users Conference Paper · August 2016 DOI: 10.1109/ASONAM.2016.7752341 CITATIONS READS 11 345 5 authors, including: Andrew Park Herbert H Tsang Trinity Western University Trinity Western University 35 PUBLICATIONS 184 CITATIONS 57 PUBLICATIONS 486 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: RNA Secondary Sturcture Design View project RNA Secondary Structure Prediction View project All content following this page was uploaded by Andrew Park on 14 September 2018. The user has requested enhancement of the downloaded file. SEE PROFILE 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Temporal Analysis of Radical Dark Web Forum Users Andrew J. Park∗ , Brian Beck, Darrick Fletche, Patrick Lam, and Herbert H. Tsang† ∗ Department of Computing Science, Thompson Rivers University, Kamloops, British Columbia, Canada Research Lab, Trinity Western University, Langley, British Columbia, Canada Email: apark@tru.ca∗ and herbert.tsang@twu.ca¶ † Applied Abstract—Extremist groups have turned to the Internet and social media sites as a means of sharing information amongst one another. This research study analyzes forum posts and finds people who show radical tendencies through the use of natural language processing and sentiment analysis. The forum data being used are from six Islamic forums on the Dark Web which are made available for security research. This research project uses a POS tagger to isolate keywords and nouns that can be utilized with the sentiment analysis program. Then the sentiment analysis program determines the polarity of the post. The post is scored as either positive or negative. These scores are then divided into monthly radical scores for each user. Once these time clusters are mapped, the change in opinions of the users over time may be interpreted as rising or falling levels of radicalism. Each user is then compared on a timeline to other radical users and events to determine possible connections or relationships. The ability to analyze a forum for an overall change in attitude can be an indicator of unrest and possible radical actions or terrorism. I. I NTRODUCTION Communication on a global scale has become commonplace and necessary to keep pace with the events of the world. This is applicable to nearly every member of society: from the average citizen conversing with relatives in another country, to government agencies coordinating relief efforts, to extremist or terrorist groups devising their next attack [1]. Radical groups such as ISIS (Islamic State of Iraq and Syria) require a means of spreading their propaganda and messages in the hopes of recruiting new members. The Internet is an all encompassing network that allows such organizations to plan and carry out their oftentimes violent goals. Recently, these extremist groups utilized mainstream social media for the purposes of communication and recruitment. This included sites such as Facebook and Twitter [2]. However, these companies have altered their terms and conditions in order to filter out and delete posts that contained violent and extremist content. In light of these changes, these groups have moved communications to a lesser known part of the Internet known as the Deep Web. The Deep Web contains websites not accessible or linked through from common search engines like Google. To access the Deep Web users require specialized software to reach the hidden network of routers. By design, this network allows for relative anonymity because in order to trace back to a particular user, you would have to travel through a series of different routers around the globe. This IEEE/ACM ASONAM 2016, August 18-21, 2016, San Francisco, CA, USA 978-1-5090-2846-7/16/$31.00 c 2016 IEEE varying path through routers makes tracking very difficult. A subset of the Deep Web is the Dark Web. This section is home to a black market for many illicit goods, the most common being illegal drugs. It is also an area where users can post anonymously through certain forums. It is in these forums that radical users and groups can speak freely of their ideologies and opinions. Our research examines whether it is possible to analyze the opinion over time of these postings to identify users that are escalating in extremism and to correlate these escalations with real world terrorist events [3] [4] [5]. II. BACKGROUND The Dark Web archive is a collection of critical jihadist and extremist web forums [6]. This archive has been accessible for researchers and practitioners to study and analyze various social and computational problems. A number of studies using the Dark Web archive have tried to detect the indicators of extremism using techniques like data mining, textural analysis, and sentiment analysis [7], [8], [9], [10]. In particular, Scrivens et al. developed a method of identifying the most radical users across four Dark Web discussion forums using sentiment analysis techniques and their Sentiment-based Identification of Radical Authors (SIRA) algorithm [10]. However, their method simply looks at the posting lifetime of each user in the forums. This paper presents our preliminary research study that has furthered their work by investigating the change in opinion over time and possible temporal correlations between the forum users’ radicalism and real world terrorist events. III. M ETHOD Our analysis consists of five steps: a) Data Preparation, b) Part of Speech Tagging, c) Sentiment Analysis, d) Forum Temporal Analysis, and e) User Temporal Analysis. The following sections document the detailed procedure and analysis for each step. A. Data Preparation This research is based on forum data from 6 different forums on the Dark Web. The forums used are Gawaher, Islamic Network, Islamic Awakening, Turn to Islam, Myiwc, and Ummah. These forum posts range from the year 2000 all the way to 2012. The number of users for the forum are 48, 324 and total number of posts in our analysis are 2, 547, 463. This time frame gives us a large amount of data to analyze, as well as the ability to correlate the forum data to terrorist events that have happened. The data from the forum posts in the form of text files were transferred into an SQLite database for easier access and manipulation. B. POS tagging Once the forum data were in the database, forum posts were analyzed to find sentiment scores. In this step, we examined the forum posts and broke down the posts into individual words and phases for sentiment analysis in the next step. We employed OpenNLP for this particular process. OpenNLP (Open Natural Language Processor, http://opennlp.sourceforge.net ) is free natural language processor from the Apache suite of open source software. This tool offers many different language processing options but we mainly used the tokenizer which splits plain text into an array of words and tokens, and the Parts of Speech (POS) tagger which tags each word or token as that part of speech that it is, e.g. noun, verb, adjective etc. The forum posts were first run through the OpenNLP tokenizer to create an array with each element containing a word, or a token such as punctuation. The array was then used by the POS Tagger to find all of the nouns in each post, and add them to a count to keep track of the frequency of each noun. Once all the posts had been processed we then created a list of the top 100 most frequently used nouns. These 100 nouns gave us a picture of the topics being discussed, which matched topics that might have attracted radical users. C. Sentiment Analysis (Opinion Mining) After the forum posts were broken down, we performed the sentiment analysis on these posts. SentiStrength is a sentiment analysis tool [11]. It takes plain text as input, and analyzes it for opinion scores that range from -5 (very negative) to 5 (very positive), and with 0 being neutral. SentiStrength also has the ability to analyze around keywords instead of the overall text so you can get opinion scores directed at certain topics. In our research, every post was analyzed using SentiStrength keyword analysis, using our 100 most frequent nouns that we obtained from previous steps as the keywords to focus our analysis around. This ensured that we were analyzing sentiment related to the topics related to our research, and leaving out any opinions on unrelated topics. This helped to create more accurate sentiment scores for what we needed. This analysis tagged and scored each post in the database with a sentiment score ranging from -5 to 5. D. Forum Temporal Analysis Then the sentiment scores for posts from each forum were combined on a month to month basis creating an average monthly sentiment score per forum. This shows the average sentiment of all users in each forum and how it changes each from one month to the next. This was used to track negative or positive spikes and correlate them to the months when major terrorist events occurred. In the data we examined for this paper (2000 to 2012), three terrorist events were selected where the forum temporal analysis showed drastic negative spikes in sentiment scores. The first event was the bombings in London, England on July 7, 2005. Event two was in Rawalpindi, Pakistan on December 27, 2007 where Benazir Bhutto was assassinated and 24 others were killed by suicide bombs. The last event was in Mumbai, India November 26-29, 2008 where coordinated shootings and bombings rocked the city. E. User Temporal Analysis Using the three events chosen based on the forum temporal analysis from the previous step, the analysis of individual users was performed. The goal was to take a closer look at individual sentiment scores and how they would change on a day-to-day basis during the month of a major terrorist event. Initially, a top 20 list was created for the most negative, and positive users based on their average sentiment scores over their forum posting lifetime. This was done to compare the difference between our negative users and positive users. Any users with less than 500 posts were eliminated from the running due to limited forum use. This provided us with a base to start individual analysis around terrorist events. However we soon realized that only 4-5 users from the top 20 negative posters and 1-5 users from the top 20 positive users were active during the month of each event. Thus top 10 lists for positive and negative users were drafted for each event based on the average sentiment scores of users only during that month, ensuring that all 10 users are active during the month of the event. IV. R ESULTS & D ISCUSSION From the graph of the total monthly forum data (Fig. 1), we can see the total average sentiment score is negative or below zero score. This indicates that a majority of the postings done by users were of a negative nature centered around the most used key nouns. Based on the negative spikes, we were able to find certain real world terrorist attacks that correlate with them. Whenever most of the real world terrorist attacks happened, deep negative spikes occurred at the same times. We believe these spikes are caused by discussion of the terrorist events in question. One event in particular, the assassination of Benazir Bhutto, coincided very closely with a large dip in sentiment across almost all forums during that time. After looking at the forums as a whole, we began to look at individual users of the forums to see if any patterns could be extracted. Taking the 2005 London Bombing as an example, we mapped the top 5 negative users who were active during that time (Fig. 2). User 1558 seemed to be the only actively posting user during the time of the bombing. No real significance can be garnered from the user’s postings however. The chart illustrating average scores during the attack (Fig. 3) shows a downward trend towards the event itself. Fig. 1. Combined average sentiment scores over lifetime of forums with selected terrorist events. Fig. 2. Top negative users during 2005 London Bombing. Fig. 3. Average sentiment scores during 2005 London Bombing. V. C ONCLUSION AND F UTURE D IRECTIONS This paper presents a novel way of analyzing the Dark Web forums through temporal analyses of forums and users as well as sentiment analysis to discover trends and patterns of sentiment scores over time correlated to real world terrorist events. In this study, we have analyzed over two and a half million posts of Dark Web forum data. We were able to find correlations between overall monthly forum scores and real world terrorist events. These trends could be caused by discussion generated from the events themselves. Negative change in sentiment over time for a certain user can certainly indicate their overall attitude towards certain issues, and may very well be cause for concern for law enforcement or government agencies. At this time, however, we cannot conclusively say whether a user experiencing a negative sentiment spike is truly radical or not. Close examination of the forum postings and content would determine if they were truly a threat or not. Future research could include an automated solution to analyze Dark Web forum posts for extreme changes in sentiment. ACKNOWLEDGMENT The authors would like to acknowledge the generous provision of the data by the Data Infrastructure Building Blocks (DIBBs) for ISI project funded by the National Science Foundation (ACI-1443019). We would also like to thank Thompson Rivers University and Trinity Western University for their support. View publication stats R EFERENCES [1] M. Ashcroft, A. Fisher, L. Kaati, E. Omer, and N. Prucha, “Detecting jihadist messages on twitter,” in EISIC 2015, September 7–9, Manchester, UK. IEEE Computer Society, 2015. [2] M. J. G. Lapayese, “Terrorism and its transition to cyberspace,” in Intelligence and Security Informatics Conference (EISIC), 2015 European. IEEE, 2015, pp. 178–178. [3] R. Colbaugh and K. Glass, “Estimating sentiment orientation in social media for intelligence monitoring and analysis,” in 2010 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 2010, pp. 135–137. [4] B. Liu, “Sentiment analysis and opinion mining,” Synthesis lectures on human language technologies, vol. 5, no. 1, pp. 1–167, 2012. [5] M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment strength detection for the social web,” Journal of the American Society for Information Science and Technology, vol. 63, no. 1, pp. 163–173, 2012. [6] Y. Zhang, S. Zeng, C.-N. Huang, L. Fan, X. Yu, Y. Dang, C. A. Larson, D. Denning, N. Roberts, and H. Chen, “Developing a dark web collection and infrastructure for computational and social sciences,” in 2010 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 2010, pp. 59–64. [7] H. Chen, W. Chung, J. Qin, E. Reid, M. Sageman, and G. Weimann, “Uncovering the dark web: A case study of jihad on the web,” Journal of the American Society for Information Science and Technology, vol. 59, no. 8, pp. 1347–1359, 2008. [8] A. Abbasi, H. Chen, and A. Salem, “Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums,” ACM Transactions on Information Systems (TOIS), vol. 26, no. 3, p. 12, 2008. [9] H. Chen, Dark web: Exploring and data mining the dark side of the web. Springer Science & Business Media, 2011, vol. 30. [10] R. Scrivens, G. Davies, R. Frank, and J. Mei, “Sentiment-based identification of radical authors (sira),” 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 2015. [11] M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment strength detection for the social web,” Journal of the American Society for Information Science and Technology, vol. 63, no. 1, pp. 163–173, Jan. 2012. [Online]. Available: http://dx.doi.org/10.1002/asi.21662