Macro Trends in Counter Terrorism Technologies

advertisement
Macro Trends in
Counter-Terrorism Technologies
And Thoughts on Responsible Innovation
DETECTER Project, Brussels
September 7th, 2011
Jeff Jonas, IBM Distinguished Engineer
Chief Scientist, IBM Entity Analytics
JeffJonas@us.ibm.com
1
Today’s Material
 Background
 Macro Trends
 Detecting Bad Guys in Big Data
 Challenging Privacy and Civil Liberties Issues
 Privacy by Design (PbD) Considerations
 Questions and Answers
2
Background
 Early 80’s: Founded Systems Research & Development
(SRD), a custom software consultancy
 1989 – 2003: Built numerous systems for Las Vegas
casinos including a technology known as Non-Obvious
Relationship Awareness (NORA)
 2001/2003: Funded by In-Q-Tel
 2005: IBM acquires SRD
 2005: Acquired by IBM, now Chief Scientist, IBM Entity
Analytics
 Cumulatively: I have had a hand in a number of systems
with multi-billions of rows describing 100’s of millions of
entities
3
Roles
 Member, Markle Foundation Task Force on National
Security in the Information Age
 Board Member, US Geospatial Intelligence Foundation
(USGIF), the GEOINT organizing body
 Senior Associate, Center for Strategic and International
Studies (CSIS)
 Member, EPIC advisory board
 Advisor, Privacy International
4
Current Primary Area of Interest
 Making sense of information in large data sets,
across complex ecosystems with emphasis on privacy
and civil liberties protections
– 1996: Created an identity-centric customer repository based on
4,200 disparate systems … >100 million resolved identities
– 2001: Assistance in various post-9/11 data analysis programs for
public and private sector
– 2005: Missing persons project following Hurricane Katrina
resulting in re-unification of >100 loved ones
5
A Late Bloomer to Privacy
6
1980 – 2001
No clue whatsoever
2001 – 2006
Slowly waking up
2007 – 2011
Today, at best, a
student of privacy
A Journey Fraught with Reflection and Rethinking
The greater
my privacy and
civil liberties
awareness
7
The greater
the number of
imperfections
appear in my
rearview mirror
Katrina – Missing Persons Reunification Project
 Information about status of persons quickly end up
scattered across countless databases
– Over 50 such web sites/organizations were identified as having
victim related data
– Many people were registered duplicate times in the same
database
– Many people were registered duplicate times across databases
– Many people were registered as missing in one database and
found in another database
 Connecting found persons previously reported as
missing becomes nearly impossible
– Too many databases
– Constantly changing data
8
Katrina Reunification Project Statistics
 Total data sources
 Usable records
1,570,000
 Unique persons
36,815
 Total loved ones reunited
9
15
>100
Katrina – Missing Persons Reunification Project
 Privacy by Design (PbD)
– Contractually authorized to delete all the data
after the reunification office completed its work
– Hence, a few months later, all collected data and
reporting products were deleted
DESTRUCTION OF EVIDENCE!
Data Decommissioning – Destruction of Accountability
10
Macro Trends
11
Avg Age
Good News: The World is Not More Dangerous
37
1900:
Western
Europe
Today:
Global
Average
Number Dead
67
75M
~17+%
300M
~4.5%
1300’s:
“Black Death”
12
Today:
If America
sunk into ocean
and everyone dies
Prediction
Your doctor is 102
and this is not weird.
13
Complexity of Execution
Bad News: “More Death Cheaper in Future” Graph
10 Kiloton
Nuke
1918
Spanish
Influenza
Death
14
1918 Spanish Influenza Genome
15
Complexity of Execution
“More Death Cheaper in Future” Graph
= Bad
10 Kiloton
Nuke
Easier
1918
Spanish
Influenza
More Death
Death
16
Jerome Kerviel – US$7B
www.chinapost.com.tw/news_images/20080127/p1d.jpg
17
Jerome Kerviel – US$7B
Back it out
Back it in
Back it out
Analytic
Checkpoint
Analytic
Checkpoint
18
Back it in
1 Day
2050 Predictions
A single person can
kill 100M people for
<$1,000.
19
State of the Union:
Enterprise Amnesia
20
Amnesia, definition
A defect in memory, especially resulting
from brain damage.
21
US National Security Amnesia Events
9/11
Two known terrorists were admitted into the US (only discovered
after the fact).
Christmas Day Bomber
Abdulmutallab possessed a multi-entry VISA while at the same
time was on the terrorist watch list (only discovered after the
fact).
22
Computing Power Growth
Trend: Organizations Are Getting Dumber
Every two days now we create as
much information as we did from
the dawn of civilization up until
2003.”
Available
Observation
Space
~ EricContext
Schmidt, CEO Google
Enterprise
Amnesia
Sensemaking
Algorithms
Time
23
Computing Power Growth
Trend: Organizations Are Getting Dumber
Available
Observation
Space
WHY?
Context
Sensemaking
Algorithms
Time
24
Algorithms at Dead End.
You Can’t
Squeeze Knowledge
Out of a Pixel.
25
No Context
scrila34@msn.com
26
Context, definition
Better understanding
something by taking into
account the things around it.
27
Information without
context
is hardly actionable.
28
Lack of Context – Consequences
 Alert queues growing faster than the
humans address – filled mostly with false
positives
 The top item in the queue is not the most
relevant item
 Items require so much investigative
effort – they are often abandoned
prematurely
 Risk assessment becomes the risk
29
29
Information in Context … and Accumulating
scrila34@msn.com
Job
Applicant
No Fly
List
30
Most
Trusted
Source
Known
Terrorist
The Puzzle Metaphor
 Imagine an ever-growing pile of puzzle pieces of varying
sizes, shapes and colors
 What it represents is unknown – there is no picture
 Is it one puzzle, 15 puzzles, or 1,500 different puzzles?
 Some pieces are duplicates, missing, incomplete, low
quality, or have been misinterpreted
 Some pieces may even be professionally fabricated lies
 Until you take the pieces to the table and attempt
assembly, you don’t know what you are dealing with
31
32
Puzzling: 4 Puzzles, 620 Useful Pieces
270 pieces
90%
30 pieces
10%
200 pieces
66%
6 pieces
2%
150 pieces
50%
33
(duplicates)
(pure noise)
+36 Useless Pieces!
34
First Discovery
35
More Data Finds Data
36
Duplicates in Front Of Your Eyes
37
First Duplicate Found Here
38
39
40
Incremental Context – Incremental Discovery
41
6:40pm
START
22min
“Hey, this one is a duplicate!”
35min
“I think some pieces are missing.”
37min
“Looks like a bunch of hillbillies on
a porch.”
44min
“Hillbillies, playing guitars, sitting
on a porch, near a barber sign …
and a banjo!”
150 pieces
50%
42
Incremental Context – Incremental Discovery
43
47min
“We should take the sky and grass
off the table.”
2hr
“Let’s switch sides, and see if we
can make sense of this from
different perspectives.”
2hr10m
“Wait, there are three … no, four
puzzles.”
2hr17m
“We need a bigger table.”
2hr18m
“I think you threw in a few random
pieces.”
44
45
46
Trend: Big Data [in context] = New Physics
More data: better the predictions
– Lower false positives
– Lower false negatives
More data: bad data … good
– Suddenly glad your data was not perfect
More data: less compute
47
From Pixels to Pictures to Insight
Contextualization
Observations
48
Relevance Detection
Persistent
Context
Consumer
(An analyst, a system,
the sensor itself, etc.)
One Form of Context is “Expert Counting”
 Is it 5 people each with 1 account … or is it 1
person with 5 accounts?
 Is it 20 cases of H1N1 in 20 cities … or one
case reported 20 times?
 If one cannot count … one cannot estimate
vector or velocity (direction and speed).
 Without vector and velocity … prediction is
nearly impossible.
49
Skilled adversaries engage in
“channel separation.”
50
Cell Phone #1
Cell Phone #2
Bank Acct #1
Passport #1
Unknown
Unknown
Billy K.
William A.
Hence, detection requires
“channel consolidation.”
William A
aka Billy K.
• Cell Phone #1
• Cell Phone #2
• Bank Acct #1
• Passport #1
51
Expert Counting: Degrees of Difficulty
Deceit
Bob Jones Ken Wells
123455
550119
Incompatible
Features
Fuzzy
Exactly
Same
Bob Jones
123455
52
Bob Jones
123455
Bob Jones
123455
Bob Jones
123455
Robert T Jonnes
000123455
bjones@hotmail
Deceit Detection Using Context Accumulation
Feature
Accumulation
Deceit
Robert Jones
123455
POB 13452
DOB 03/12/73
Bob Jones
POB 13452
gw3e56@hotmail.com
53
Bob Jones Ken Wells
123455
550119
Ken Wells
550119
POB 999911
DOB 03/12/73
gw3e56@hotmail.com
gw3e56@hotmail.com
DOB 03/12/73
Robert Jones
123455
Resolved!
Ken Wells
550119
3 Models for
Information Sharing
54
1. Bulk Transfer
 Large collections are passed along to appropriate third parties
 May be required if the recipient must commingle the data in
secret
 The recipients must have a capacity much larger than their own
native requirements
 The more copies the more difficult it is to maintain the
information currency across the ecosystem
 The more copies the more difficult to prevent of unintended
disclosure
 Useful when the number of recipients and transactional
volumes are very small
55
2. Services for Inquiry
 Owners enable third party inquiry (human or machine lookups)
 When lots of systems are integrated, federated search can be
automated to search all third party data sources based on a
single user/machine search
 Each system in the federation must be sized for all volume
 Third party systems often lack the necessary indexes
 Nearly impossible to ensure each federated systems is on-line
 Useful for periodic, on-demand, inquiry using each third party
data source like a reference system – particularly appropriate
for narrow investigative work and/or forensic analysis
 Not that useful for detect/preempt missions
56
3. Central Catalog/Index
 Parties interested in information sharing supply metadata to a
central catalog (index)
 Inquiries can discover the location of all available documents
using a single lookup
 Card catalogs provide pointers to source systems and
documents enabling efficient/scalable lookup (aka federated
fetch)
 Easier to keep the data current … than bulk transfer
 Scales massively
 Easier to secure
57
Discovery at the Library
?
Subject
58
Title
Author
Enterprise Discovery
Who
59
What
Where
When
How
The Policy Focus Becomes … “Discoverability”
If you don’t publish your meta-data (who,
what, where, when) to the enterprise
catalog …
Information is not discoverable …
Therefore, the value of your operational
system to the broad strategic interests of
the enterprise is effectively ZERO!
60
Are You Playing Well With Others?
SHARING SCORECARD(*)
DISCOVERABILITY
Organization
Records
Discoverable
%
This org
5B
2.5B
50%
That org
120B
6B
5%
The other org
3B
1B
33%
Their org
1B
750K
75%
Their other org
1B
500K
50%
(*) Any resemblance to real organizations and real number would be coincidental
61
Challenging
Privacy and Civil Liberties
Issues
62
Issue #1: Essential Secrets vs. Transparency
 To detect professionally fabricated lies,
using only data, one must either:
1. Collect observations the adversary doesn’t know you have
2. Or, be able to perform compute over your observations in a
manner the adversary cannot fathom
 The Challenge: How can organizations
catch bad guys if there is transparency
over their observational space and what
is computable?
63
Issue #2: More Data Good
 The good news: Both those in the counterterrorism business
and privacy community equally detest false positives
– The government recognizes that false positives waste government
resources
– The privacy community recognizes that false positives place the innocent
under undeserved government scrutiny
 The challenge: Two remedies for false positives
1. Change the rules to reduce the number of alerts (which increases the
false negatives)
2. Add more information such that the additional context permits greater
discrimination
 The more data, the lower the false positives and the lower
the false negatives
64
Issue #3: Necessity of Central Indexes
 Federated search is extremely limited
– Does not scale when the mission is to get “left of boom”
(detection)
 Central card catalogs (indexes) are the only
viable way forward
– Only the metadata centralized with pointers, not all the
data
 The Challenge: General reaction to central
databases, even if just an index
65
Issue #4: Lone Gunmen Surveillance
 Rare events planned by one or a small group are more difficult
to detect
 The size of the observation space needed to detect lone
gunmen planning acts of terrorism … approaches ubiquitous
surveillance
 Risk-based surveillance
– A car bomb in a public place
– A sector of national infrastructure at risk
– WMD over a major city
 The Challenge: At some point when one person can create
extraordinary damage, cheaply, without a trace … then what?
66
Issue #5: Less Secrets Lead to Chilling Effects?
 It is becoming harder and harder to
have secrets
 Will this chill behavior?
– Will population behavior gravitate towards the
center of the bell curve?
– Or, will mankind become more tolerant of
diversity?
67
Privacy by Design (PbD)
Considerations
68
Universal Declaration of Human Rights
 Article 9
No one shall be subjected to arbitrary arrest, detention or exile.
 Article 12
No one shall be subjected to arbitrary interference with his privacy,
family, home or correspondence, nor to attacks upon his honor and
reputation. Everyone has the right to the protection of the law
against such interference or attacks.
 Article 15
(1) Everyone has the right to a nationality.
(2) No one shall be arbitrarily deprived of his nationality nor denied
the right to change his nationality.
 Article 17
(1) Everyone has the right to own property alone as well as in
association with others.
(2) No one shall be arbitrarily deprived of his property.
69
PbD: Information Attribution
 Avoid the receipt of any data that does not come with an
ability to track its pedigree/attribution.
 When passing your data into secondary systems, pass the data
pedigree/attribution along to the recipient (even if that means
only a pointer to your copy).
 If the ‘chain of where data came from’ is not maintained in the
information sharing ecosystem – there is no hope of keeping it
current and very difficult to reconcile cross-system
consistency.
More here:
Full Attribution, Don’t Leave Home Without It
Out-bound Record-level Accountability in Information Sharing Systems
70
PbD: Data Destruction
 When the data is no longer needed or there is a mandate …
purge it.
 For example, at the close of a special information analysis
project; consider decommissioning the data sets in proportion
to the consequences of unintended disclosure or misuse.
 If there is a legal requirement to retain data, or long term
accountability is necessary, consider pushing the data to forms
of retrieval useful only in the context of
forensic/investigatory purposes.
More here:
Decommissioning Data: Destruction of Accountability
71
PbD: Limit Data Transfers
 If you don’t have to move the entire record: don’t.
 Using information sharing systems as an example, it is best not
to send all the data to each (and every) information sharing
partner. Better to create a central index with prescribed
fields. The index then points to the original data holder – and
getting access to the original record requires permission at
that time, from the original data holder. This ensures a degree
of transparency.
More here:
Discoverability: The First Information Sharing Principle
72
PbD: Data Tethering
 When data is moved from systems of record out into
secondary systems, as the source data changes (adds, changes
and deletes) these secondary systems should be notified.
 If the secondary systems have themselves forwarded the data
to tertiary systems, these same changes should be passed
through the entire food chain.
More here:
Data Tethering: Managing the Echo
73
PbD: Obfuscate Data
 For every copy there is a increasing risk of unintended
disclosure.
 When there is an opportunity to perform data masking,
anonymization, encryption … do it.
 Techniques now exist whereby data can be first obfuscated
(e.g., encrypted, anonymized, masked, etc.) before information
transfer ... while still maintaining a capability of performing
deep analytics (e.g., data matching) post obfuscation.
More here:
To Anonymize or Not Anonymize, That is the Question
74
Maximizing Discovery - Minimizing Disclosure
Persistent
Context
!
FEATURES:
Cd5dced41028cb7ea51
00c9782a552a2d09b1b
7f2b6e48ea7d042bbe8
75
Observations
Cd5dced41028cb …
00c9782a552a2 …
7f2b6e48ea7d0 …
…
Record #A-701
Sensors
Employee
Database
0d06b31faa7c…
B5e341a4b0c…
00c9782a552…
…
Record #B-9103
Fraud
Database
Maximizing Discovery - Minimizing Disclosure
Observations
Mark Randy Smith
DOB: 06/07/74
123 Main Street
713 731 5577
Record #A-701
Sensors
Policy Controls
Discovery
Employee
Database
M. Randal Smith
DOB: 06/07/74
713 731 5577
Record #B-9103
76
Record #A-701
Matches
Record #B-9103
Policy Controls
Fraud
Database
PbD: Build Accountability into Systems
 Opt for the use of tamper-resistant audit logs. The greater
the lack of transparency, the greater the need for immutable
logs: mandated or not.
More here:
Immutable Audit Logs (IAL’s)
Found: An Immutable Audit Log
77
Comments on: Data Mining
 Data mining is not bad. There are setting where data mining is
very valuable and saves lives
 Predictive Data Mining – Limited efficacy without volumes of
training data
 Predicate Triage Data – Used to organize data sets containing
only “subjects of interest”
More here:
Effective Counter-Terrorism and the Limited Role of Predictive Data Mining
Data Mining, Predicate Triage and NSA Domestic Surveillance
78
Data Mining Defined
(humorous)
“Torturing the data until it confesses …
and if you torture it enough, you can
get it to confess to anything.”
ACM SIGKDD Conference, Philadelphia 2006
79
Comments on: Link Analysis
 Link analysis is very powerful, when used in a narrow fashion.
Inspection of “subjects of interest” outward.
 Predicate-based link analysis: Big social maps are not useful
unless one has an entrance point.
 Link analysis: prune early
More here:
Hunting Bad Guys, Phone Records and a Few Good Dead Men
Predicate-based Link Analysis: A Post 9/11 Analysis (1+1= 13)
Sometimes a Big Picture is Worth a 1,000 False Positives
80
Comments on: Watch Listing and False Positives
 Difference between wrongly named and wrongly matched
 Low fidelity watch lists are the single biggest cause of false
positives - solving this ambiguity involves additional data
 Minimize collection, maximize consumer participation and
election
 Provide a redress process
More here:
Precision in TSA’s Terrorist Watch List
Comments on the TSA No-Fly and Selectee Watch List Process
81
Closing Thoughts
82
”The data must find the
data … and the relevance
must find the user.”
83
In Closing
 There is going to be more sensors, more data
 This data will be commingled for greater accuracy to serve
consumers and protecting countries
 What data is collected/observed and when … will be the debate
 Chief privacy principle: Avoid consumer surprise
 If it has been collected, the holder has the obligation to make
sense of it
 Organizations must harness data to be smart, efficient, and
survive … but how smart do they need to be and do we trust
them?
 Hence the tension
84
Related Papers
Heritage Foundation: Paul Rosenzweig/Jeff Jonas
Correcting False Positives: Redress and the Watch List Conundrum
Cato Foundation: Jeff Jonas/Jim Harper
Effective Counterterrorism and the Limited Role of Predictive Data Mining
Steptoe & Johnson: Stewart Baker
Anonymization, Data-Matching and Privacy: A Case Study
IEEE Security and Privacy: Jeff Jonas
Threat and Fraud Intelligence: Las Vegas Style
Giannino Bassetti Foundation: Jeff Ubios
Transparency, Privacy and Responsibility: An Interview with Jeff Jonas
Markle Foundation
Nation At Risk: Policy Makers Need Better Information to Protect the Country
85
Related Blog Posts
Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel
Puzzling: How Observations Are Accumulated Into Context
When Risk Assessment is the Risk
Big Data. New Physics.
The Christmas Day Intelligence Failure – Part II: Jeff Jonas’ Christmas Wish List
Decommissioning Data: Destruction of Accountability
Source Attribution, Don’t Leave Home Without It
Data Tethering: Managing the Echo
Out-bound Record-level Accountability in Information Sharing Systems
To Anonymize or Not Anonymize, That is the Question
Immutable Audit Logs (IAL’s)
The Information Sharing Paradox
Discoverability: The First Information Sharing Principle
When Federated Search Bites
Using Transparency As A Mask
86
Macro Trends in
Counter-Terrorism Technologies
And Thoughts on Responsible Innovation
DETECTER Project, Brussels
September 7th, 2011
Jeff Jonas, IBM Distinguished Engineer
Chief Scientist, IBM Entity Analytics
JeffJonas@us.ibm.com
87
Download