Application of Confidence Intervals to Text-based Social Network Construction By CDT Julie Jorgensen, 06, G4 Advisors: MAJ Ian McCulloh, D/MATH LTC John Graham, D/BS&L Agenda The Real-World Problem Text Analysis/Social Network Analysis Solution Social Network Analysis Simple Text Analysis A Better Solution Themed Analysis Example Case – Jihadist Texts Theme Scores Network Construction Procedure Jihadist Network Results Importance and Conclusions The Real-World Problem Commanders need to understand “Human Terrain” Majority of ‘HT’ information is in text form The Combating Terrorism Center receives volumes of data every day. Harmony Database is being rapidly declassified Need an efficient way to plow through large amounts of text data and see the linkages. Solution: Text Analysis Displayed in Social Network Analysis Social Network Analysis A mathematical method of quantifying connections between individuals or groups and drawing conclusions from those connections Assumes rational beings are interdependent Nodes Key Actors Links Relationships between Nodes “Human Terrain” Example: 9/11 Hijacker Network Iraq Elections Barzani Khamenei Demonstration Data Set: Jihadist Texts Approx. 250 translated texts MEMRI FBIS Other Sources 15 Authors More than 1 text Not well known Simple Text Analysis: The Plagiarism Check Problem Word matching is overly simple. Ignores context Actors can be overly weighted by writing more Alternative: Themed Analysis Traditional Network Analysis Methods Citation Analysis Physical Network Communication or Financial Network Themed Analysis Relates nodes across multiple fields One similar theme versus many similar themes Demonstration: Text Analysis Theme Scores ISLAM allah religion islam muslim ummah brother book messenger prophet mohammad JIHAD al_jihad mujahid attack raid defense plane bombing operation clash fight conflict SALAF salaf sunnah sallam INFIDEL infidel apostate heretic kuffr taghoot idol THEMES FOREIGNERSSHEIKH united_states shaykh government al-Saud Australia Britain Spain Italy France BATTLEGROUNDS Afghanistan bosnia two-rivers iraq palestine JEWS jews zionists usury israel *Theme Score is the sum of each word’s score per text Problem Commander needs information in representations he/she understands. Networks can compare authors across single themes But difficult to compare authors across multiple themes Constructing a Network Across Multiple Themes Scrub Texts Construct Theme Scores Construct Confidence Intervals Discern Similarity between Nodes Binary or Standardized Difference of Means Create Square Matrix Draw Network *why not ANOVA? Confidence Intervals 95% Confidence Interval = Each Author, ts n Each Theme Example: Text ctc127 ctc126 ctc125 ctc124 ctc123 ctc122 ctc121 ctc120 ctc119 ctc118 Score 0.7234 0.7328 0.5387 0.668 0.2012 0.6931 0.3977 0.227 0.0553 0.823 Author Theme Mean 0.50602 Mugrin Islam Width Low High 0.191819 0.314201 0.697839 Relationship Scores Each possible pair of authors per theme Overlapping Confidence Intervals MaxDiff ActDiff si , j MaxDiff Disparate Confidence Intervals si , j 0 Matrix Construction • Multiplication of Scores for each author and each theme ai i 1 n Geometric Mean = 1 n • Resultant Square Matrix Mugrin al-Iraqi Alshareef al Albanee Ibn Baaz Abdul Aziz Azzam At Tartusi Maqdisi Shuaibi Al-Fahd Madkhalee Madhi Al-Awdah Qaradhawi Mugrin 1.00000 0.76695 0.00000 0.00000 0.00000 0.00000 0.84938 0.00000 0.84852 0.80676 0.83939 0.00000 0.84403 0.00000 0.00000 al-Iraqi 0.76695 1.00000 0.51748 0.00000 0.00000 0.00000 0.84449 0.00000 0.69722 0.82516 0.81203 0.00000 0.72532 0.00000 0.00000 Overall Theme Scores Alshareef al Albanee Ibn Baaz Abdul Aziz Azzam At Tartusi Maqdisi 0.00000 0.00000 0.00000 0.00000 0.84938 0.00000 0.84852 0.51748 0.00000 0.00000 0.00000 0.84449 0.00000 0.69722 1.00000 0.75690 0.83688 0.00000 0.00000 0.00000 0.00000 0.75690 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.83688 0.00000 1.00000 0.91174 0.82297 0.78024 0.80594 0.00000 0.00000 0.91174 1.00000 0.00000 0.00000 0.73681 0.00000 0.00000 0.82297 0.00000 1.00000 0.59977 0.93159 0.00000 0.00000 0.78024 0.00000 0.59977 1.00000 0.52446 0.00000 0.00000 0.80594 0.73681 0.93159 0.52446 1.00000 0.00000 0.00000 0.90168 0.52157 0.81534 0.81876 0.77203 0.77599 0.00000 0.91619 0.85487 0.89227 0.82699 0.86424 0.00000 0.90076 0.00000 0.95733 0.00000 0.00000 0.00000 0.94616 0.00000 0.86383 0.88681 0.79010 0.00000 0.82544 0.00000 0.00000 0.87589 0.94896 0.00000 0.00000 0.76400 0.00000 0.00000 0.69418 0.00000 0.63895 0.00000 0.77915 Shuaibi 0.80676 0.82516 0.00000 0.00000 0.90168 0.52157 0.81534 0.81876 0.77203 1.00000 0.92896 0.00000 0.57030 0.64583 0.00000 Al-Fahd Madkhalee Madhi Al-Awdah Qaradhawi 0.83939 0.00000 0.84403 0.00000 0.00000 0.81203 0.00000 0.72532 0.00000 0.00000 0.77599 0.00000 0.94616 0.00000 0.00000 0.00000 0.90076 0.00000 0.00000 0.00000 0.91619 0.00000 0.86383 0.87589 0.69418 0.85487 0.95733 0.88681 0.94896 0.00000 0.89227 0.00000 0.79010 0.00000 0.63895 0.82699 0.00000 0.00000 0.00000 0.00000 0.86424 0.00000 0.82544 0.76400 0.77915 0.92896 0.00000 0.57030 0.64583 0.00000 1.00000 0.00000 0.80821 0.86983 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.80821 0.00000 1.00000 0.00000 0.00000 0.86983 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 Themed Network Theme Analysis: Confidence Interval vs Average Author islam jihad 10 5 al-Fahd 6 4 Mugrin 11 9 Shuaibi 12 2 Azzam 9 1 Maqdisi 8 6 al-Iraqi 8 At-Tartusi 14 10 Abdul Aziz 5 2 7 Madhi 3 Qaradhawi 15 3 14 Alshareef 13 Madkhalee 4 11 Al-Awdah 13 14 al Albanee 1 7 12 Ibn Baaz Theme Ranks salaf infidel foreigners battlegrounds sheikh jew Average Rank 6 3 7 6 2 9 5.57 12 6 1 4 11 9 6.29 11 1 6 5 3 9 6.57 10 7 8 3 7 8 7.00 8 8 5 9 10 4 7.14 13 5 9 1 11 9 7.57 4 2 15 10 1 5 7.71 9 4 10 12 5 6 7.86 13 12 3 7 11 2 7.86 13 9 11 2 4 1 8.14 3 14 2 11 11 3 8.29 2 11 12 12 6 9 8.57 7 10 4 8 8 7 8.71 1 14 14 12 11 9 9.57 5 13 13 12 9 9 10.14 Able to look at each theme individually. Average Rank does not account for connections importance, weighting, predictors Weighted Texts Degree Al-Fahd 8 9.389 Maqdisi 6 8.549 Ibn Baaz 10 8.41 Shuaibi 3 7.606 Madhi 4 7.26 Azzam 4 7.185 Abdul Aziz 4 5.818 al-Iraqi 7 5.189 Mugrin 10 4.955 Al-Awdah 16 4.105 Alshareef 2 3.833 At Tartusi 2 3.55 Qaradhawi 7 2.112 Madkhalee 7 1.858 al Albanee 2 1.658 Overall 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 NrmDegree 70.053 63.789 62.745 56.753 54.17 53.608 43.41 38.714 36.971 30.625 28.602 26.489 15.76 13.864 12.368 Themes are combined Can see connections between authors across a combination of themes. Method Comparison Themed Network Analysis Plagiarism Theme Ranks Jihad Theme Al-Fahd Al-Awdah Al-Fahd Maqdisi Maqdisi Maqdisi Mugrin Azzam Ibn Baaz Al-Albanee Shuaibi Qaradhawi Shuaibi Al-Iraqi Azzam Mugrin Madhi Azzam Maqdisi Al-Fahd Top 5 Once Top 5 Every Method Conclusions Socially Engineered Algorithms involve extensive tradeoffs and decisions by the mathematician that can significantly impact commander’s decision-making. Multiple views of the same data is a critical requirement. Find Linkages in large amounts of data Find Connections across multiple fields Non-Tangible Relationships Real World: Track / Catch criminals / radical ideologues Representation of Human Terrain Future Work Publish method in Journal of Computational and Mathematical Organization Theory Integration into ORA (Organizational Risk Analysis) Statistical Software: In use by Intelligence Analysts. Analysis of change over time Questions? References Dr. Jaret Brachman. Combating Terrorism Center, USMA. Dr. Steven Corman. Hugh Downs School of Human Communication, Arizona State University. http://www.checkpoint-online.ch/CheckPoint/Images/NHusseinCapture.jpg http://www.salmac.co.za/profile-writing-arabic.gif Wasserman, Stanley and Katherine Faust. Social Network Analysis: Methods and Applications. New York: Cambridge University Press, 1994, 4.