How Several Different Investigation Groups Can Share a Single

advertisement
How Several Different Investigation
Groups Can Share a Single, Secure Database
of Rhino Horn Criminal Traffickers
Timothy C. Haas, haas@uwm.edu
Draft of May 21, 2014
1
A Federated Database
Below, a method is given for the development of a single, virtual database of all evidence
gathered on persons suspected of participating in illegal rhino horn trafficking networks
(hereafter called players or suspects). This database would hold data from several different
investigation groups who work within national parks, provincial governments, and the
national government. Reports would be generated from this database that would allow
sophisticated social network analyses to be conducted to identify key players.
The removal of these key players from these networks would cause the greatest disruption to each network’s criminal activities. Criminal trafficking networks consist of several
tiers of middlemen. Players at higher tiers receive rhino horn from several different subnetworks of poachers and runners. Combining data on these subnetworks into one network
allows analysis to identify increasingly higher-tiered players (typically those in control of
funds and with export capabilities), and to reveal the true extent of the network. On
the other hand, if subnetworks are analyzed in isolation from other subnetworks, these
high-level middlemen can be mis-identified as peripheral players.
Federation members would, at their discretion share data with each other in the course
of an investigation. If third tier middlemen are funding poaching raids in widely separated
parts of South Africa, the use of a federated database would allow long-distance links to
be discovered and made known to all investigation groups.
The first two steps of any federated database option are as follows. First, each investigation group’s evidence storage software system is modified to allow it to read input
from a report file generated by either MySQL or MS Access (hereafter referred to as the
database engine). Next, data entry protocols are changed so that each investigation group
enters new evidence directly into their local, secure database engine.
When a new set of evidence is received, the investigation group would enter it into
their database engine. If the investigation group possesses a criminal intelligence software
system such as IBM’s Analyst’s Notebook or SAS Law Enforcement and Public Safety
System (formerly MEMEX), they would also perform the following two steps:
1. have their database engine write an output file of this evidence,
1
2. read this output file into their criminal intelligence software system.
With these steps, an investigation group would enter new evidence only once into their
computers.
In the following sections, different options for implementing a federated database are
described. A working example of a zero-cost option is provided. Next, a method is given
for sending de-identified network linkage data to an outside specialist for analysis. Finally,
a workshop is outlined that would train investigation group personnel on the basics of
querying databases, and social network analysis applied to criminal network analyses.
2
Implementing a Federated Database
2.1
A Zero Cost Implementation
The following protocol could be used to query a virtual database composed of all evidence
aggregated across all investigation groups.
1. A member of the federation of investigation groups, called herein the requester sends
an email with a Structured Query Language (SQL) query in it to each federation
member.
2. Upon receipt of an email query a member may choose to ignore the email query
due to lack of trust in the requester. Or, the member may run the query against
their local MySQL database and send the query’s result as an encrypted file that is
attached to an email back to the requester. Several free utilities are available on the
web that encrypt files using a shared key. Investigation groups would share this key
amongst themselves by meeting once at a centrally located coffee shop and sharing
a common key string. MySQL is a free database program (see below).
3. The requester collects all received query responses into a single data file. All of
these different, local databases taken together form a virtual, single database (often
called a federated database) of all players that these investigation groups are gathering evidence on. Haase et al. (2010) review different architectures for federated
databases and conclude that for data that is changing frequently, queries to member
databases is more efficient and less expensive than first building a master database
and then querying it.
4. The requester uses this single data file to identify all links between criminal network
players and creates a Pajek readable link file. Pajek is a free social network analysis
software package (Nooy et al. 2011).
2
5. The requester reads this link file into Pajek and draws the network, and computes
its centrality measures. This analysis produces actionable intelligence in the form
of a ranked list of suspects to take into custody.
It is possible to automate the receipt of the incoming query request email, the execution
of its query, and the generation of the outgoing email message containing the query’s
results by using a free set of macros written by Mehta and Williams (2002). This approach
does not require any group’s database be exposed to web hacking, nor the purchase of
any software beyond MS Outlook.
2.1.1
MySQL Database of Criminal Wildlife Traffickers
If an investigation group does not have the resources to purchase a criminal intelligence
database package, a free package has been written and is available at
www4.uwm.edu/people/haas/sna
This package is written for MySQL, a free database system. To run this example, a
version of MySQL that is 5.2.3 or later is needed. One download site that offers a virusfree download is
http://mysql-com.en.softonic.com/
The example consists of three files: createdatabase.sql, addtodatabase.sql, and
querydatabase.sql. The first file creates a six-table database: players, phones, cars,
guns, random identifiers, and encryption key. The second file gives an example of adding
three suspects to the database. The third file runs the query needed to create a file
suitable for social network analysis.
The query file performs two tasks as follows.
1. First, internally-generated random identifier numbers are assigned to suspects – thus
relieving the data manager from having to maintain a log relating local suspect IDs
to local suspect names. These random IDs are then used in lieu of suspect names
during the creation of the first output file of the query.
Note however, that random identifiers are not a substitute for encrypted names in
a federated database because there is no guarantee that the same random identifier
will be assigned to the same suspect who is present in the local databases of two or
more investigation groups (see the next section).
2. Second, using a given key, a second output file of the query is written wherein
suspect names have been encrypted via an algorithm that is currently considered to
be unbreakable: the AES encryption algorithm of Daemen and Rijmen (2003).
3
2.2
A Low Cost Implementation
To have dedicated database support, each federation member would also purchase MS
Access or MS Office (MS Access is bundled in this package). In addition to Microsoft
support for MS Access, there is a large user base with several active forums.
2.3
More Expensive Alternatives
Federations members could achieve secure transmissions by jointly purchasing a Virtual
Private Network (VPN). Also, there are many web-based database systems that offer
greater opportunities for data integration such as one based on a set of distributed Microsoft SQL Servers.
These higher-performance solutions however, are more expensive and require stronger
Information Technology (IT) support.
3
De-identifying Data for Outside Analysis
There are many criminal network analyses that Pajek cannot perform such as (a) automatic weighting of links by each player’s number of phones, cars, or guns, (b) reconstruction of the network (prediction of unobserved links and group memberships), (c)
prediction of who will succeed an arrested player, and (d) an optimal arrest strategy. The
services of an outside specialist may be needed to perform such analyses.
A data file sent to any outside specialist however, should not contain any classified,
private, or confidential information. The action of replacing suspect names in a database
with encrypted or random identifiers is referred to as de-identifying or de-classifying the
database.
If the network is changing frequently through time, such analyses may need to be rerun every week. In this case, de-identification needs to be automatic. One way approach
is as follows.
1. An SQL query script would be provided to each investigation group by the outside
specialist. This script, when run against a federation member’s local database,
would de-identify each suspect’s name by replacing the name with an encrypted
name (hereafter called a codename). Because all investigation groups would use
the same encryption key as discussed above, these codenames could not be reverse
engineered back to the original names by anyone not having the secret key.
Similar to the discussion, above, because this name encryption step assigns a unique
codename to a unique player name, all of these different, local databases taken
together would form a virtual, single database of all players that these investigation
groups are gathering evidence on.
4
A second approach is to use a commercial cipher program. Two such programs are:
www.littlelite.net/ncryptxl and
www.extendoffice.com/order/kutools-for-excel.html
Finally, a third alternative is to use a bit manipulation cipher in Excel. If the
AES algorithm is not used, the next best way is to use a bit manipulation (xor)
cipher with a key that is almost as long as the total number of characters in the
complete suspect name list. This way, the cipher is essentially unbreakable because
it is almost a one time pad cipher. This is what is programmed in the Excel VBA
code deident.bas that is available at
www4.uwm.edu/people/haas/sna
To install this macro, do the following:
(a) Start Excel.
(b) ??
The Excel spreadsheet, deident.xlsm, on the above website contains this macro.
Note that all three alternatives will support a federated database only if all participants use the same key. Analyst’s Notebook has a declassify link chart function that
replaces all names on a link chart with non-traceble identifier strings. But is feature will not support a federated database because the cipher key used by Analyst’s
Notebook is not necessarily the same across all federation members.
2. The SQL query script would ask for:
(a) each pair of players that are linked through (say) an intercepted phone call,
(b) how many phones, and/or cars, and/or guns each player has, and
(c) for each pair of players, the number of evidence items that mention both of
them.
3. Each investigation group would then create an email message containing this deidentified data file and send it to the outside specialist for social network analysis.
Upon receipt of all data files from cooperating federation members, the specialist
would:
(a) aggregate all data files into one file,
(b) analyze the aggregated network,
(c) share the results of the analysis with all investigation groups.
Because a particular suspect’s name may differ to some degree across the databases
of the federation, the specialist will replace a pair of suspects with one suspect if the pair
have nearly identical addresses and, optionally, car registration numbers.
5
4
Summary
The tasks listed below would enable investigation groups to implement a federated database
and grasp the essentials of social network analysis.
1. Agree to setup a federated database to gather information on organized criminals
engaged in wildlife trafficking.
2. Implement the federated database in either MySQL or Excel.
3. Compute social network analysis centrality measures on this federated database to
identify the network’s current set of key players.
4. Also using this federated database, predict unobserved links, predict who will succeed an arrested player, and develop an optimal strategy for arresting suspects.
References
Daemen, J., and Rijmen, V. (2003), The Rijndael Block Cipher, AES Proposal. Retrieved
December 20, 2013, from
http://csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf
Haase, P., Mathäβ, T., and Ziller, M. (2010), “An Evaluation of Approaches to Federated Query Processing over Linked Data,” I-SEMANTICS ’10 Proceedings of the 6th
International Conference on Semantic Systems, Article No. 5, September 1-3, Graz,
Austria. Retrieved November 8, 2013 from
http://dl.acm.org/citation.cfm?id=1839713
Mehta, A. and Williams, D. (2002), “SQL and Outlook: Enable Database Access and
Updates through Exchange and Any E-mail Client,” MSDN Magazine, Microsoft
Corporation. Retrieved November 8, 2013 from
msdn.microsoft.com/en-us/magazine/cc301799.aspx
Nooy, Wouter de, Mrvar, A., and Batagelj, V. (2011), Exploratory Social Network Analysis
with Pajek, Second Edition, Cambridge, U.K.: Cambridge University Press. Software
may be freely downloaded from
http://pajek.imfm.si Retrieved May 11, 2013.
6
Download