Knowledge Management, Social Network Analysis, and Knowledge Discovery for Homeland Security Sidd Kaza

advertisement
Knowledge Management, Social
Network Analysis, and Knowledge
Discovery for Homeland Security
Sidd Kaza
(sidd@u.arizona.edu)
MIS 480/580
Feb 20, 2007
1
Outline
• Knowledge Management using
COPLINK
• Social Network Analysis of Criminal
Networks
• Social Network Concepts using NetDraw
2
Knowledge Management using
COPLINK
• COPLINK is an knowledge management
system that integrates information from
multiple law-enforcement agencies.
• It incorporates algorithms for crossjurisdictional social network analysis,
knowledge discovery, and visualization for
intelligence, border safety, and national
security applications.
3
Multiple Isolated Data Sources
within a Single Agency
Records Management
System (RMS)
Gang Database
Mug Shots Database
Tucson Police Department Records System
4
Isolated Agencies Share Limited Information
through State and Federal Systems
Pima County
Systems
Tucson Police Department
Systems
Phoenix Police
Department
Systems
5
Provide Access to Information using One
Friendly Interface
Records Management
System (RMS)
Gang Database
Mugshots Database
6
Consolidated Information Provides Opportunities for
Analytical and Data Analysis Applications
7
COPLINK™ Information Retrieval Interface
8
Query Parameters and Filters
Running the
query with
filters.
9
Person Search Results
A search of White
males named Mike
20-35, 5’5” to 6’3”
150 to 250 lbs returns
a generic set of
results (24 persons).
10
Association Retrieval and Visualization
11
Spatio-temporal Analysis and Visualization
12
Outline
• Knowledge Management using
COPLINK
• Social Network Analysis of Criminal
Networks
• Social Network Concepts using Netdraw
13
Criminal Activity Networks
• Criminal Activity Networks (CAN) are networks of people,
vehicles and locations that are linked by law enforcement
information.
• These networks allow us to understand the complex
relationships between people and vehicles.
• Analysis of the topological characteristics of these networks
helps better understand their governing mechanisms.
• In this study we analyze the topological characteristics of
CANs of people and vehicles in a multiple jurisdiction
scenario to support border and transportation security.
14
Literature Review
•
•
•
•
Criminal Activity Network extraction
Previous studies of complex networks
Topological characteristics of networks
The theory of growth in networks
15
Criminal Activity Network Extraction
• The extraction of CANs involves analyzing
information from many different datasets.
• Accessing information from multiple sources poses
many challenges that are documented in literature.
[Garcia-Molina, 2002; Rahm, 2001]
• This study uses the BorderSafe information sharing
and analysis framework. [Marshall et al., 2004]
• Using the framework, law enforcement and other
datasets are accessed such that they are amenable for
network extraction and analysis.
16
Complex Networks: Previous Studies
• There have been various studies to understand the
characteristics of large and complex networks.
• The studies have explored the topology, evolution, robustness
and other properties of real world networks.
– The World Wide Web [Albert, Jeong and Barabasi, 1999; Kumar et al., 2000]
– Cellular and metabolism networks [Jeong et al., 2000]
– Citation networks [Redner, 1998]
• Most real world networks were found to have similar
topological and evolutionary characteristics. [Albert and Barabasi,
2002]
17
Topological Characteristics
• Topological characteristics are used to study
networks at a macro level.
• Three concepts dominate the statistical study of
topology: [Albert and Barabasi, 2002]
– Small world
• Despite the large size of networks, nodes often have relatively short
paths between them.
– Clustering
• The tendency of nodes to cluster together to form cliques,
representing circles of friends in which every member knows every
other member.
– Degree distribution
• The distribution of edges among nodes, where different nodes may
have different number of edges.
18
Small World
• The small world concept is important as it can depict the communications
within a network.
• Communication can range from the spread of disease in human populations
and spread of viruses on the Internet to passage of messages and commands
in a criminal network.
• The small world property of a network is measured by the average path
length. [Albert and Barabasi, 2002]
• The average shortest path length of many real networks have been
measured.
– Movie actors were found to be an average distance of 3.65 from each other.
[Watts & Strogatz, 1998]
– Average paths between co-authors in MEDLINE were 4.6. [Newman, 2001]
• Shortest path lengths of social networks are small due to the presence of
shortcuts between otherwise distant people. [Watts, 1999; Nishikawa et al, 2002 ]
19
Clustering
• Individuals in social networks often form cliques.
• Examples of cliques in social network include authors collaborating
together in a co-authorship network and websites pointing to each other on
the web.
• The tendency to form cliques is measured by the clustering coefficient
(CC) which is a ratio of the number of edges that exist in a network to the
total number of possible edges. [Albert and Barabasi, 2002]
• Real networks tend to have high CC often compared to random graphs:
– Movie actors: 0.79 [Watts & Strogatz, 1998]
– MEDLINE co-authorship: 0.066 [Newman, 2001]
• The CC in a criminal network points to the tendency of individuals to
collaborate together and partner in crimes.
20
Degree Distribution
• Nodes in a network have different number of edges connecting them. The
number of edges connected to a node is called its degree.
• The spread in node degrees is given by a distribution function P(k), which
gives the probability that a randomly selected node has exactly ‘k’ edges.
[Albert and Barabasi, 2002]
• The distribution functions of most real world networks follow power law
scaling with varying exponents:
– Movie actors: exponent of 2.3. [Watts & Strogatz, 1998]
– Medline co-authorship: exponent of 1.2. [Newman, 2001]
• In criminal networks, high degree of individuals may imply their
leadership. [Xu and Chen, 2004]
• The degrees of nodes are also used to study the growth and evolution of
networks.
21
Growth in Networks
• Most real world networks (including CANs) are not static and
grow due to the addition of nodes and/or edges.
• The growth of networks changes their topological
characteristics.
• Two mechanisms govern evolving networks: [Barabasi and Albert,
1999; Dorogovtsev, Mendes and Samukhin, 2000; Newman, 2001]
– Growth: networks expand continuously by adding new nodes and,
– Preferential attachment: new nodes attach preferentially to nodes that
are already well connected.
22
Preferential Attachment
• Network growth involves adding new nodes (and edges) to the
set of current nodes.
• Preferential attachment assumes that the probability that a new
node will connect to an existing node i depends on the degree
of the node.
– The higher the degree of the existing node, higher the probability that
new nodes will attach to it.
• The functional form of preferential attachment ((k)) for a
network can be measured by observing the nodes present in
the network and their degrees [Albert and Barabasi, 2002]
23
Preferential Attachment: Previous Studies
• ∏(k) for co-authorship, citation, actor and the Internet
networks was found to follow the power law distribution.[Jeong,
Neda and Barabasi, 2003; Newman, 2001]
• However, in some cases (k) may grow linearly up to a point
and then fall off at high degrees. [Newman, 2001]
• This implies that the high degree nodes are not able to attract
more newer nodes.
• Constraints to growth are also seen in criminal networks.
24
Constraints on Growth of a Network
• Constraints on the number of links a node can attract may be
due to:[Amaral et al, 2000]
– Aging: Since the growth of the network may be over time, some high
degree nodes might become too old to participate in the network. (e.g.,
actors in a movie network)
– Cost: It might become costly for a node to attach to a large number of
nodes.
• Constraints on the growth of networks may be domain specific
and have been studied in many domains:
– In plant-animal pollination networks, some animals cannot pollinate
certain plants: hence a link cannot be established. [Jordano, Basocompte and
Olesen, 2003]
– In criminal networks, trust may restrict the growth of networks.
Criminals and terrorists do not include many people in their inner trust
circle. [Klerks, 2001]
25
Research Questions
• What are the topological characteristics of criminal
networks?
• How does cross-jurisdictional data affect the
topological characteristics of criminal networks?
• How do criminal networks grow on adding data from
more jurisdictions?
26
Research Testbed
• The testbed for this study contains incident reports of all the individuals
and vehicles involved in crimes in the jurisdiction of Tucson Police
Department (TPD) and Pima County Sheriff’s Department (PCSD) from
1990 to 2002.
Incidents
Individuals
Vehicles
TPD
2.99 million
1.44 million
675,000
PCSD
2.18 million
1.31 million
520,000
• A CAN consists of individuals and vehicles represented as nodes and
police incidents represented as edges.
• Two nodes have an edge between them when they are involved in the
same police incident.
• Narcotics networks are extracted from the testbed.
27
Research Design
• The study is divided into three parts:
– Characteristics of criminal networks in a single jurisdiction.
• Narcotics networks that include individuals and incidents reported
in a single jurisdiction are analyzed.
– Characteristics of the networks by combining data from
multiple jurisdictions.
• Narcotics networks including individuals and incidents reported in
both TPD and PCSD are analyzed.
• The implications of the topological properties of these
networks are explained in the law enforcement
domain.
28
Experiment Results
Narcotics Networks in a Single Jurisdiction
Basic Statistics
TPD
PCSD
Nodes
31,478 individuals
11,173 individuals
Edges
82,696
67,106
22,393 (70%)
10,610 (94%)
41
0.0002
103
0.0008
Giant component
2nd largest component
Link density
29
Experiment Results
Single Jurisdiction (cont.)
Small World Properties
Clustering Coefficient
Average Shortest Path Length
(L)
Diameter
TPD
PCSD
0.39 (1.39 x 10-4)
0.53 (4.08 x 10-4)
5.09
4.62
22
23
Values in parenthesis are values for a random network of the same size and average degree.
30
Implications of the Small World Property
• The narcotics networks in both jurisdictions can be classified as small
world networks.
• The clustering coefficients of the networks are much larger than their
random counterparts.
– This suggests that criminals show the tendency to from circles of associates
where members commit crimes together.
– This is not unusual in narcotics networks where an individual commits crimes
with friends and people in his trust circle.
– This property works as an asset to law enforcement in identifying criminal
conspiracies.
• A short L in a narcotics network has important implications for both crime
and law enforcement:
– It improves the speed of flow of information and goods in the network.
– It also suggests that criminals often commit crimes with individuals outside
their group. This creates the shortcuts needed to reduce L.
– A short average path length has positive implications for law enforcement too.
Short paths between criminals generate better leads in crime investigations.
31
Single Jurisdiction (cont.)
Degree Distributions
TPD Narcotics Network
PCSD Narcotics Network
0
0
cumulative p(k)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
-2
-2
-4
-4
-6
-6
1
2
3
4
5
0.18
-8
0.16
-8
0.14
0.12
0.1
0.08
-10
0.06
-10
0.04
0.02
83
70
60
55
52
49
46
43
40
37
34
31
28
25
22
19
4
16
1
13
7
10
0
-12
-12
k
k
These diagrams show the log-log plots of the cumulative degree distribution (p(k)) vs. the degree (k).
The insets are p(k) vs. k. The solid line is the truncated power law curve.
32
Implications of the Scale Free Property
• The narcotics networks in both jurisdictions can be classified as scale free
(SF) networks.
• This implies that a large number of individuals do not have have many
associates but, a few have large number of associates.
• The exponents in both power law decays are very small (0.85 – 1.3). The
distribution decays slowly for lower degrees, indicating that there a large
number of nodes with small degrees.
– This is not unexpected as criminals with high degrees attract more attention
from law enforcement authorities so having less associates is beneficial.
• The truncated power law fits (R2 =93%) better than the power law
distribution (R2 =85-87%) .
– As the number of links (k) grows, the probability of nodes having ‘k’ links
decreases.
– This might indicate the cost or trust constraint (criminals may not want to
attach to many people) to growth.
33
Growth in Multiple Jurisdictions
This curve shows the preferential attachment when the narcotics network
in TPD is augmented with data from PCSD.
Preferential Attachment (TPD < PCSD)
1
0.9
0.8
0.7
K(k)
0.6
0.5
0.4
0.3
0.2
0.1
83
70
60
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
k
The dashed line above the curve shows a linear preferential attachment growth, the
34
solid line shows the state of no preferential attachment.
Preferential Attachment: Implications
• The curves lie above and grow faster than the solid
line, offering visual evidence of the presence of
preferential attachment.
• Two properties of growth between jurisdictions are
worth noting:
– The curve maintains linearity at low value of k. The
linearity breaks down for higher degrees.
– In totality the lower degree nodes attract more nodes
towards themselves than higher degree nodes.
35
Preferential Attachment: Implications
(cont.)
• Break in Linearity
– The slow growth of nodes with high degree can be
attributed to the nature of networks being studied.
– Cost/Trust effect: Criminals may not prefer to be related to
a large number of individuals for the risk of drawing
attention. Thus, the cost of acquiring more links is high,
this might prevent a node with large number of links to
acquire more.
– External influences: Law enforcement limits the number of
crimes a individual can commit.
36
Preferential Attachment: Implications
(cont.)
• Lower degree nodes attract more nodes
– The data on police incidents is drawn from two different
jurisdictions.
– A criminal might be committing more crimes in one
jurisdiction and not the other.
– Thus, one jurisdiction may have incomplete information
about the activity of some criminals in the network.
– These criminals will have a low degree in one jurisdiction.
– On adding the second jurisdiction, the degree of these
criminals increase since they commit more crimes in the
second jurisdiction.
– This will lead to lower degree nodes attracting more nodes
than higher degree nodes.
37
Conclusions
• This study focused on topological properties of criminal
activity networks and their link to law enforcement, border and
transportation security.
• Criminal networks are small world networks with scale free
distributions. These topological characteristics have important
implications for law enforcement and hence transportation
security.
• A single jurisdiction contains incomplete information on
criminals and cross-jurisdictional data provides an increased
number of higher quality investigative leads.
38
Outline
• Knowledge Management using COPLINK
• Social Network Analysis of Criminal
Networks
• Social Network Analysis Concepts using
Netdraw
39
Online Sources
• Studies discussed today
– http://ai.eller.arizona.edu/paper_conf/index.htm
• Visualize your social/organizational networks
– http://www.touchgraph.com/
40
Download