Introduction

advertisement
MINFS544: Network-based
Business Intelligence (BI)
Feb 19th, 2013
Daning Hu, Ph.D.,
Department of Informatics
University of Zurich
F Schweitzer et al. Science 2009
Stop Contagious Failures in Banking Systems
 During 2008 financial tsunami, which bank(s) we should inject
capital first to stop contagious failures in bank networks?
2
Utilize Peer Influence in Online Social Networks
 Intelligent Advertising, Product Recommendation
 Who are the most influential people?
 What are the patterns of information diffusion?3
Develop Strategies to Attack Terrorist Networks
A Global Salafi Jihad Terrorist Network
Hu et al. JHSEM 2009
 How to effectively break down a terrorist network?
4
Network-based Business Intelligence
Network-based (Modeling and Analysis)
Modeling and analyzing various real-world social and organizational
networks to understand:
 the cognitive and economic behaviors of the network actors; and
 the dynamic processes behind the network evolution
Based on the above…
Business Intelligence (BI)
Design network-based BI algorithms and information systems to
provide decision support in various application domains
 Financial Risk Management, Security Informatics, and Knowledge
Management, etc.
 Network Analysis, Simulation of Network Evolution, Data Mining, etc.
5
MINFS544: Network-based Business Intelligence
• Lecturer: Dr. Daning Hu; Teaching Assistant: Dr. Jiaqi Yan
• Email: hdaning@ifi.uzh.ch jackiejqyan@gmail.com
• Credits: 3 ECTS credits
• Class Meetings: Tue 14:00-15:45 PM, or Thu 10:15 –12:00
pm (Please see the schedule)
• Language: English
• Audience: Master and doctoral students
• Office Hours: Tue 13:00–14:00 PM, Room 2.A.12
• Grading: Course report (term paper) 70%, presentation
20%, participation 10%
Grading
• 1. A full research paper (70%). The format of this paper can
be found at:
http://icis2012.aisnet.org/index.php/submissions
– * If possible, get it published in ICIS 2013 and get it cited.
• This paper should include answers to the following
questions:
–
–
–
–
–
What is the problem?
Why is it interesting and important?
Why is it hard? Why have previous approaches failed?
What are the key components of your approach?
What 1) models, 2) data sets and 3) metrics will be used to validate
the approach?
7
Grading
• 2. Oral presentation of the paper (using slides) + Q&A
(20%)
• For presentations, please see slide on How to give a good
research talk at:
• http://research.microsoft.com/enus/um/people/simonpj/papers/giving-a-talk/giving-a-talk.htm
• 3. Active participations and interactions (10%)
8
Course Schedule
Date
Event
19.02.13
Course introduction, Kick-off meeting
20.02.13 –
04.03.13
One to one meeting
05.03.13
Research method tutorial
06.03.13 –
18.03.13
One to one meeting on research
progress
19.03.13
Lecture, feedback and discussion
20.03.13 –
10.04.13
Writing Research-in-Progress Paper
(RIP)
11.04.13
Lecture, feedback and discussion
12.04.13 –
02.05.13
Writing full paper
25.04.13,
16.05.13
Presentation days, feedback and
discussion
29.05.13
Final paper due
Deliverable
1 page summary due
3 pages literature
review due
5 – 8 pages RIP due
8 – 12 pages full paper
due (first deadline)
A Brief History of Network Science
1736
 Mathematical foundation – Graph Theory
1930
 Social Network Analysis and Theories
 Sociogram: Network visualization
 Six degree of separation
 Structural hole: Source of innovation
1990
2000
 (Physicists) Complex Network Topologies
 Small-world model (e.g., WWW)
 Scale-free model (“Rich get richer”)
 Network Science
 Economic networks (Agent modeling & simulation)
 Dynamic network analysis
 BI applications: product diffusion in social media,
recommendation systems
2012
 ?
10
Outline
 Introduction
 Dynamic Analysis of Dark Networks
 A Global Salafi Jihad (GSJ) Terrorist Network
 A Narcotic Criminal Network
 A Network Approach to Managing Bank Systemic Risk
 Ongoing Work
 Conclusion
11
Dynamic Network Analysis (DNA)
 Studying dynamic link formation processes behind
network evolution.
 Nodes forming links
What
 Model the changes in
network evolution
 Temporal changes in
network topological
measures
 Dynamic network
recovery on
longitudinal data
Network Evolution
Why
 Statistical analysis of
determinants behind
link formation
 Homophily
 Preferential
attachment
How
 Simulate the
evolution of networks
 Agent-based
Modeling and
Simulation
 Examine network
robustness
 Shared affiliations
12
Research Testbed: A Global Terrorist Network
 The Global Salafi Jihad (GSJ) network data is compiled by a
former CIA operation officer Dr. Marc Sageman - 366 terrorists
 friendship, kinship, same religious leader, operational interactions, etc.
 geographical origins, socio-economic status, education, etc.
 when they join and leave GSJ
 The goal of dynamic analysis
 gain insights about the evolution of GSJ network
 develop effective attack strategies to break down GSJ network
Sample data of GSJ terrorists
13
a
14
15
Dynamic Network Analysis
 Studying dynamic processes (i.e., link formation) behind
network evolution.
 Nodes’ behaviors
Network Evolution
What
Why
 Model the changes
 Statistical analysis of
in network evolution
 Temporal changes in
network topological
measures
 Dynamic network
recovery on
longitudinal data
determinants behind
link formation
 Homophily
 Preferential
attachment
How
 Simulate the
evolution of networks
 Agent-based
Modeling and
Simulation
 Examine network
robustness
 Shared affiliations
16
Temporal Changes in Network-level Measures
Average Degree <k >
0.24
0.21
14
degree
12
10
8
<k>
probability of degree
16
0.18
0.15
1990
1991
0.12
1993
0.09
Poisson
0.06
6
0.03
4
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
2
0
19
89
19
9
19 0
9
19 1
92
19
9
19 3
94
19
9
19 5
9
19 6
97
19
9
19 8
99
20
0
20 0
0
20 1
02
20
03
b
0.24

Fig.1. The temporal changes in the (a)
average degree, (b) and (c) degree
distribution
0.21
probability of degree
a
0.18
0.15
1995
0.12
1997
1999
0.09
0.06
0.03
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52

Degree = number of links a node has
c
Findings
 There are three stages for the evolution of the GSJ network:
 1989 - 1993 The emerging stage:
 The network grows in size
 Accelerated Growth - No. of edges increases faster than nodes
 Random network topology (Poisson degree distribution)
 1994 - 2000
The mature stage:
 The size of the network reached its peak in 2000
 Scale-free topology (Power-law degree distribution)
 2001 - 2003 The disintegration stage:
 Falling into small disconnected components after 9/11
18
Temporal Changes in Node Centrality Measures
Degree
Betweenness
60
5000
4500
50
4000
3500
40
3000
30
2500
2000
20
1500
1000
10
500
0

Figure.2. Temporal changes in
Degree and Betweenness
centrality of Osama Bin Laden
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
0

Degree: No. of links a node has

Betweenness of a node i
 No. of shortest paths from all nodes to
all others that pass through node i
 Measure i’s influence on the traffic
(information, resource) flowing through it
19
Findings and Possible Explanations
 1994 – 1996: A sharp decrease in Bin Laden’s Betweenness
 1994: Saudi revoked his citizenship and expelled him
 1995: Went to Sudan and was expelled again under U.S. pressure
 1996: Went to Afghanistan and established camps there
 1998 –1999: Another sharp decrease in his Betweenness
 After 1998 bombings of U.S. embassies, Bill Clinton ordered a freeze
on assets linked to bin Laden (top 10 most wanted)
 August 1998: A failed assassination on him from U.S.
 1999: UN imposed sanctions against Afghanistan to force the Taliban
to extradite him
20
Research Testbed: A Narcotic Criminal Network
 The COPLINK dataset contains 3 million police incident
reports from the Tucson Police Department (1990 to 2006).
 3 million incident reports and 1.44 million individuals
 Their personal and sociological information (age, ethnicity, etc.)
 Time information: when two individuals co-offend
 AZ Inmate affiliation data: when and where an inmate was housed
 A Narcotic Criminal Network
 19,608 individuals involved in organized narcotic crimes
 29,704 co-offending pairs (links)
Table 1. Summary of the COPLINK dataset and the Arizona inmate dataset
Number of People
Time Span
COPLINK
Narcotic Data
Arizona Inmate
Data
Overlapped (identified by first
name, last name and DOB)
36,548
165,540
19,608
1990 - 2006
1985 - 2006
17 years
21
Statistical Analysis of Determinants for Link Formation
Proportional hazards model (Cox Regression Analysis)
 h(t, x1, x2, x3...) = h0 (t)exp(b1x1 + b2 x2 + b3 x3...)
 Homophily in age (group) and race
 Shared affiliations:
 Mutual acquaintances (through crimes)
 Vehicle affiliation (same vehicle used by two in different crimes)
vehicle
mutualacq
Fig.3. Results of
multivariate survival
(Cox regression)
analysis of triadic
closure (link formation).
age
race
gender
01
10
20
Hazard Ratio g
30
40
22
BI Application: Co-offending Prediction in COPLINK
 IBM’s COPLINK is an intelligent police information system
aims to to help speed up the crime detection process.
 COPLINK calculates the co-offending likelihood score based
on the proportional hazards model .
 A ranked list of individuals based on their predicted likelihood of
co-offending with the suspect under investigation.
Fig.4. Screenshots
of the COPLINK
system
23
Simulate Attacks on Dark Networks
 Three attack (i.e. node removals) strategies:
 Attack on hubs (highest degrees)
 Attack on bridge (highest betweenness)
 Real-world Attack (Attack order based on real-world data)
 Simulate two types of attacks to examine the robustness
of the Dark networks
 Simultaneous attacks (the degree/betweenness of nodes are NOT
updated after each removal) – Static
 Progressive attacks (the degree/betweenness of nodes are
updated after each removal) – Dynamic
24
Hub Vs. Bridge Attacks
 Both hub and bridge attacks are far more effective than realworld arrests – Policy implications?
 Both Dark networks are more vulnerable to Bridge attacks
than Hub attacks.
 Bridge (highest beweenness): Field lieutenants, operational leaders, etc.
 Hub (highest degree) : e.g., Bin Laden
GSJ
1
0.9
0.8
S and <s>
0.7
S (Hub attacks)
0.6
S (Bridge attacks)
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
Fraction of nodes removed
0.8
1
25
Summary and Contributions
 We developed a set of Dynamic Network Analysis (DNA)
methods that are effective in
 Linking network topological changes to analytical insights
 Systematically capturing the link formation processes
 Examining the determinants of link formation
 Dark networks are
 robust against real-world attacks
 but vulnerable to targeted bridge attacks
 COPLINK provides real-time decision support for fighting crimes.
26
Research Readings and Resources
• 1. Networks Overview:
• * Statistical mechanics of complex networks, Section III, VI
– http://rmp.aps.org/abstract/RMP/v74/i1/p47_1
• * Networks, Crowds, and Markets:
– http://www.cs.cornell.edu/home/kleinber/networks-book/
• 2. Networks in Finance:
• * Financial Networks blog and research databases:
–
–
–
–
WRDS database
http://www.financialnetworkanalysis.com/research-database/
http://www.stern.nyu.edu/networks/electron.html
* Company Board Social Networks
27
Research Readings and Resources (cont.)
• 3. Networks in Marketing:
– * Sinan Aral’s research in networks and marketing
– Peer influence
– http://web.mit.edu/sinana/www/
• * Social Media based Marketing:
– http://searchengineland.com/guide/what-is-social-media-marketing
• 4. Recommender Systems:
– http://www-cs-students.stanford.edu/~adityagp/recom.html
• 5. Word-of-Mouth Effects in Social Networks:
– http://papers.ssrn.com/sol3/papers.cfm?abstract_id=393042&
28
Download