Jorgensen, J. (2006) Applications of Confidence Intervals to Text-Based Social Network Construction. 2006 Hollis Awards Final Competition.

advertisement
Application of Confidence
Intervals to Text-based
Social Network Construction
By
CDT Julie Jorgensen, 06, G4
Advisors: MAJ Ian McCulloh, D/MATH
LTC John Graham, D/BS&L
Agenda






The Real-World Problem
Text Analysis/Social Network Analysis Solution
 Social Network Analysis
 Simple Text Analysis
A Better Solution
 Themed Analysis
 Example Case – Jihadist Texts
 Theme Scores
Network Construction Procedure
 Jihadist Network
Results
Importance and Conclusions
The Real-World Problem


Commanders need to understand “Human Terrain”
Majority of ‘HT’ information is in text form


The Combating Terrorism Center receives volumes of data
every day.
Harmony Database is being rapidly declassified

Need an efficient way to plow through large amounts
of text data and see the linkages.

Solution: Text Analysis Displayed in Social Network
Analysis
Social Network Analysis


A mathematical method of quantifying
connections between individuals or groups and
drawing conclusions from those connections
Assumes rational beings are interdependent
 Nodes

Key Actors
 Links

Relationships between Nodes
“Human Terrain” Example:
9/11 Hijacker Network
Iraq Elections
Barzani
Khamenei
Demonstration Data Set:
Jihadist Texts

Approx. 250 translated texts
 MEMRI
 FBIS
 Other

Sources
15 Authors
 More
than 1 text
 Not well known
Simple Text Analysis: The Plagiarism Check
Problem



Word matching is
overly simple.
Ignores context
Actors can be
overly weighted by
writing more
Alternative: Themed Analysis

Traditional Network Analysis Methods
 Citation Analysis
 Physical
Network
 Communication or Financial Network

Themed Analysis
 Relates

nodes across multiple fields
One similar theme versus many similar themes
Demonstration: Text Analysis
Theme Scores
ISLAM
allah
religion
islam
muslim
ummah
brother
book
messenger
prophet
mohammad
JIHAD
al_jihad
mujahid
attack
raid
defense
plane
bombing
operation
clash
fight
conflict
SALAF
salaf
sunnah
sallam
INFIDEL
infidel
apostate
heretic
kuffr
taghoot
idol
THEMES
FOREIGNERSSHEIKH
united_states shaykh
government
al-Saud
Australia
Britain
Spain
Italy
France
BATTLEGROUNDS
Afghanistan
bosnia
two-rivers
iraq
palestine
JEWS
jews
zionists
usury
israel
*Theme Score is the sum of each word’s score per text

Problem
 Commander
needs information in representations
he/she understands.
 Networks can compare authors across single themes
 But difficult to compare authors across multiple
themes
Constructing a Network Across Multiple Themes






Scrub Texts
Construct Theme Scores
Construct Confidence Intervals
Discern Similarity between Nodes
 Binary or Standardized Difference of Means
Create Square Matrix
Draw Network
*why not ANOVA?
Confidence Intervals

95% Confidence Interval = 
 Each Author,

ts n
Each Theme
Example:
Text
ctc127
ctc126
ctc125
ctc124
ctc123
ctc122
ctc121
ctc120
ctc119
ctc118
Score
0.7234
0.7328
0.5387
0.668
0.2012
0.6931
0.3977
0.227
0.0553
0.823
Author
Theme
Mean
0.50602
Mugrin
Islam
Width
Low
High
0.191819 0.314201 0.697839
Relationship Scores

Each possible pair of authors per theme
 Overlapping
Confidence Intervals
MaxDiff  ActDiff
si , j 
MaxDiff
 Disparate
Confidence Intervals
si , j  0
Matrix Construction
• Multiplication of Scores for each author and each theme


  ai 
 i 1 
n
Geometric Mean =
1
n
• Resultant Square Matrix
Mugrin
al-Iraqi
Alshareef
al Albanee
Ibn Baaz
Abdul Aziz
Azzam
At Tartusi
Maqdisi
Shuaibi
Al-Fahd
Madkhalee
Madhi
Al-Awdah
Qaradhawi
Mugrin
1.00000
0.76695
0.00000
0.00000
0.00000
0.00000
0.84938
0.00000
0.84852
0.80676
0.83939
0.00000
0.84403
0.00000
0.00000
al-Iraqi
0.76695
1.00000
0.51748
0.00000
0.00000
0.00000
0.84449
0.00000
0.69722
0.82516
0.81203
0.00000
0.72532
0.00000
0.00000
Overall Theme Scores
Alshareef al Albanee Ibn Baaz Abdul Aziz Azzam At Tartusi Maqdisi
0.00000
0.00000 0.00000 0.00000 0.84938 0.00000 0.84852
0.51748
0.00000 0.00000 0.00000 0.84449 0.00000 0.69722
1.00000
0.75690 0.83688 0.00000 0.00000 0.00000 0.00000
0.75690
1.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.83688
0.00000 1.00000 0.91174 0.82297 0.78024 0.80594
0.00000
0.00000 0.91174 1.00000 0.00000 0.00000 0.73681
0.00000
0.00000 0.82297 0.00000 1.00000 0.59977 0.93159
0.00000
0.00000 0.78024 0.00000 0.59977 1.00000 0.52446
0.00000
0.00000 0.80594 0.73681 0.93159 0.52446 1.00000
0.00000
0.00000 0.90168 0.52157 0.81534 0.81876 0.77203
0.77599
0.00000 0.91619 0.85487 0.89227 0.82699 0.86424
0.00000
0.90076 0.00000 0.95733 0.00000 0.00000 0.00000
0.94616
0.00000 0.86383 0.88681 0.79010 0.00000 0.82544
0.00000
0.00000 0.87589 0.94896 0.00000 0.00000 0.76400
0.00000
0.00000 0.69418 0.00000 0.63895 0.00000 0.77915
Shuaibi
0.80676
0.82516
0.00000
0.00000
0.90168
0.52157
0.81534
0.81876
0.77203
1.00000
0.92896
0.00000
0.57030
0.64583
0.00000
Al-Fahd Madkhalee Madhi Al-Awdah Qaradhawi
0.83939 0.00000 0.84403 0.00000
0.00000
0.81203 0.00000 0.72532 0.00000
0.00000
0.77599 0.00000 0.94616 0.00000
0.00000
0.00000 0.90076 0.00000 0.00000
0.00000
0.91619 0.00000 0.86383 0.87589
0.69418
0.85487 0.95733 0.88681 0.94896
0.00000
0.89227 0.00000 0.79010 0.00000
0.63895
0.82699 0.00000 0.00000 0.00000
0.00000
0.86424 0.00000 0.82544 0.76400
0.77915
0.92896 0.00000 0.57030 0.64583
0.00000
1.00000 0.00000 0.80821 0.86983
0.00000
0.00000 1.00000 0.00000 0.00000
0.00000
0.80821 0.00000 1.00000 0.00000
0.00000
0.86983 0.00000 0.00000 1.00000
0.00000
0.00000 0.00000 0.00000 0.00000
1.00000
Themed Network
Theme Analysis:
Confidence Interval vs Average
Author islam jihad
10
5
al-Fahd
6
4
Mugrin
11
9
Shuaibi
12
2
Azzam
9
1
Maqdisi
8
6
al-Iraqi
8
At-Tartusi 14
10
Abdul Aziz 5
2
7
Madhi
3
Qaradhawi 15
3
14
Alshareef
13
Madkhalee 4
11
Al-Awdah 13
14
al Albanee 1
7
12
Ibn Baaz


Theme Ranks
salaf infidel foreigners battlegrounds sheikh jew Average Rank
6
3
7
6
2
9
5.57
12
6
1
4
11
9
6.29
11
1
6
5
3
9
6.57
10
7
8
3
7
8
7.00
8
8
5
9
10
4
7.14
13
5
9
1
11
9
7.57
4
2
15
10
1
5
7.71
9
4
10
12
5
6
7.86
13
12
3
7
11
2
7.86
13
9
11
2
4
1
8.14
3
14
2
11
11
3
8.29
2
11
12
12
6
9
8.57
7
10
4
8
8
7
8.71
1
14
14
12
11
9
9.57
5
13
13
12
9
9
10.14
Able to look at each
theme individually.
Average Rank does not
account for connections
importance, weighting,
predictors
Weighted
Texts Degree
Al-Fahd
8
9.389
Maqdisi
6
8.549
Ibn Baaz
10
8.41
Shuaibi
3
7.606
Madhi
4
7.26
Azzam
4
7.185
Abdul Aziz
4
5.818
al-Iraqi
7
5.189
Mugrin
10
4.955
Al-Awdah
16
4.105
Alshareef
2
3.833
At Tartusi
2
3.55
Qaradhawi
7
2.112
Madkhalee
7
1.858
al Albanee
2
1.658
Overall
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15


NrmDegree
70.053
63.789
62.745
56.753
54.17
53.608
43.41
38.714
36.971
30.625
28.602
26.489
15.76
13.864
12.368
Themes are
combined
Can see connections
between authors
across a combination
of themes.
Method Comparison
Themed Network Analysis Plagiarism Theme Ranks Jihad Theme
Al-Fahd
Al-Awdah
Al-Fahd
Maqdisi
Maqdisi
Maqdisi
Mugrin
Azzam
Ibn Baaz
Al-Albanee
Shuaibi
Qaradhawi
Shuaibi
Al-Iraqi
Azzam
Mugrin
Madhi
Azzam
Maqdisi
Al-Fahd
Top 5 Once
Top 5 Every Method
Conclusions







Socially Engineered Algorithms involve extensive
tradeoffs and decisions by the mathematician that can
significantly impact commander’s decision-making.
Multiple views of the same data is a critical requirement.
Find Linkages in large amounts of data
Find Connections across multiple fields
Non-Tangible Relationships
Real World: Track / Catch criminals / radical ideologues
Representation of Human Terrain
Future Work
Publish method in Journal of
Computational and Mathematical
Organization Theory
 Integration into ORA (Organizational Risk
Analysis) Statistical Software: In use by

Intelligence Analysts.

Analysis of change over time
Questions?
References





Dr. Jaret Brachman. Combating Terrorism Center,
USMA.
Dr. Steven Corman. Hugh Downs School of Human
Communication, Arizona State University.
http://www.checkpoint-online.ch/CheckPoint/Images/NHusseinCapture.jpg
http://www.salmac.co.za/profile-writing-arabic.gif
Wasserman, Stanley and Katherine Faust. Social
Network Analysis: Methods and Applications. New York:
Cambridge University Press, 1994, 4.
Download