Immunology and Computer Security Steven Hofmeyr, Anil Somayaji

advertisement
Sensitive Data In a Wired World
Negative Representations of Data
Stephanie Forrest
Dept. of Computer Science
Univ. of New Mexico
Albuquerque, NM
http://cs.unm.edu/~forrest
forrest@cs.unm.edu
Introduction
• Goal: Develop new approaches to data security and privacy that
incorporate design principles from living systems:
–
–
–
–
Survivability and evolvability
Autonomy
Robustness, adaptation and self repair
Diversity
• Extends earlier work on computational properties of the immune
system:
– Intrusion detection
– Automated response
– Collaborative information filtering
Project Overview
• Immunology and data:
– Negative representations of information
• Epidemiology and the Internet:
– Social networks matter
– The real world is not always scale free
• The social utility of privacy:
– Why is privacy an important value in democratic societies?
– Evolutionary perspective
Collaborations
•
•
•
•
•
Paul Helman and Cris Moore (UNM)
Robert Axelrod and Mark Newman (Univ. Michigan)
Matthew Williamson (Sana Security)
Rebecca Wright and Michael de Mare (Stevens)
Joan Feigenbaum and Avi Silberschatz (Yale)
– Fernando Esponda’s post-doc next year.
How the Immune System Distributes Detection
•
•
Many small detectors matching nonself (negative detection).
Each detector matches multiple patterns (generalization).
•
Advantages of distributed negative detection:
–
–
–
–
Localized (no communication costs)
Scalable and tunable
Robust (no single point of failure)
Private
Applications to Computing
•
•
•
•
Anomaly detectors
Information filters
Adaptive queries
Negative representations
earlier work
earlier work
future
in progress
– A positive set DB is a set of fixed length strings.
– A negative set NDB represents all the strings not in DB.
– Intuition: If an adversary obtains a string from NDB, little
information is revealed.
Example:
–
–
–
–
U= All possible four character strings
DB={juan, eric, dave}
U-DB={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…}
There are 264-3= 456973 strings in U-DB.
Results
•
Can U-DB be represented efficiently, given |U-DB| >> |DB| ?
– YES: There is an algorithm that creates an NDB of size polynomial in DB.
– Strategy: Compress information using don’t care symbol. Other
representations?
DB
U-DB
NDB
000
001
01*
101
010
0*1
111
011
1*0
100
110
•
What properties does the representation have?
– Membership queries are tractable (linear time even without indexing).
• Other queries, information leakage are future work.
– Inferring information from a subset of NDB (next slide).
– Inferring DB from NDB is NP-Hard (note: not doing crypto):
•
•
•
•
Currently investigating instance difficulty.
Algorithms for increasing instance difficulty.
On-line insert/delete algorithms preserve problem difficulty.
Collaborations with R. Wright, M. de Mare, and C. Moore.
What information is revealed by queries?
(without assuming irreversibility)
•
Having access to a subset of NDB (or DB) yields some information about strings
outside that subset:
–
•
Assume NDB (or DB) is partitioned into n subsets.
To the query “Is x in DB,” what do I learn about x if x is not in my subset?
–
–
–
Must consult n subsets of NDB to conclude that x is in DB.
Must consult the subsets only until x is found (on average n/2).
Assumes that we care more about DB than U-DB.
Probability and information content as the membership of strings is
revealed. DB contains 10% of all possible L-length strings (formulas).
Private Set Intersection
• Determine which records are in the intersection of
several databases i.e.
– DB1  DB2  …  DBn
– (NDB1  NDB2  …  NDBn)
• Each party may compute the intersection
– DBi  (NDB1  NDB2  …  NDBn)
• Party i learns only the intersection of all the sets,
• And not the cardinality of the other sets.
Results cont.
• How might these properties be useful?
–
–
–
–
–
–
Protect data from insider attacks
Computing set intersections
Surveys involving sensitive information
Anonymous digital credentials
Fingerprint databases
Other ideas?
• Prototype implementations:
– Perl, C
– http://esa.ackleyshack.com/ndb
– See demo
Computer Epidemiology
Justin Balthrop, Mark Newman, Matt Williamson
300
IP network
Adminstrator network
250
Email traffic
Address books
10000
200
1000
150
100
100
10
50
0
1
0
100
200
300
400
1
Degree k
10
100
1000
Degree k
Science 304:527-529 (2004)
•
Information spreads over networks of social contacts between computers:
–
–
•
Network topology affects the rate and extent of spreading:
–
•
Email address books.
URL links.
Epidemiological models, and the epidemic threshold.
Controlling spread on scale-free networks:
–
–
–
Random vaccination is ineffective (e.g., anti-virus software).
Targeted vaccination of high-connectivity nodes.
Control degree distribution in time rather than space.
The Social Utility of Privacy
Robert Axelrod and Ryan Gerety
•
Typical framing:
– Privacy values should remain as is (e.g., Lessig).
– Individual rights vs. state (i.e., civil liberties vs. community safety / crime).
•
A community may have its own interest in defending individual privacy
(and not), independent of the civil liberties argument:
– To promote innovation in changing environments.
– To cope with distortions (e.g., overconfidence of middle managers).
– To compensate for overgeneralized norms.
•
Not necessarily advocating more privacy:
– From a societal/informational point of view how should appropriate bounds
on privacy be determined?
•
Current status:
– Exploratory modeling based on simple games.
Next Steps: Negative Representations
•
•
•
Distributed negative representations
Leaking partial information
Relational algebra operators on the negative database:
– Select, join, etc.
•
Instance difficulty:
– Hiding given satisfying assignments in a SAT formula
– Approximate representations
– Other representations?
•
•
More realistic implementations
Negative data mining:
– Is it easier/harder to find certain instances in NDB?
•
Imprecise representations:
– Partial matching and queries
– Learning algorithms
People
Stephanie Forrest
Paul Helman
Fernando Esponda
Elena Ackley
Publications
•
•
•
•
•
•
F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.''
International Journal of Information Security (submitted March 2005).
F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On-line negative databases.'' Journal
of Unconventional Computing (in press).
F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative
detection.'' IEEE Transactions on Systems, Man, and Cybernetics 34:1 pp. 357-373 (2004).
J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the
spread of computer viruses.'’ Science 304:527-529 (2004).
H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.''
"2005 International Conference on Programming Languages and Compilers (PLC'05) (in
press).
F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On-line negative databases.'' Third
International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).
SUPPLEMENTARY MATERIAL


Probabilities
F1  P(x  DB | x  NDB fj ) 
| DB |
|U |  | NDB fi |
| DB |  | DB fj |
F2  P(x  DB | x  DB fj ) 
|U |  | DB fj |
HN (x)  F1 log 2 F1  (1 F1)log 2 (1 F1)
HP (x)  F2 log 2 F2  (1 F2 )log 2 (1 F2 )

BACK
Generating Hard-to-Reverse Negative Databases
Instance Difficulty (l=64)
900
•
•
The randomized algorithm can be
used to create a negative database.
Insert/Delete operations turn known
hard formulas into negative
databases.
The Morph operator may be used to
search for hard instances.
700
600
500
Decisions
400
300
200
100
0
1
2
3
4
5
6
7
8
Specified bits per record (k-SAT)
Instance Difficulty (Glassy8 formula l=64)
60000
50000
Decisions (zchaff)
•
Decisions (zchaff)
800
40000
Original NDB
30000
Updated NDB
20000
10000
0
1
2
3
4
5
6
7
8
Specified bits per record (k-SAT)
H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable
formulas” SAT 2004.
Effect of the Morph operation
•
•
The Morph operation takes as input
a negative database NDB and
outputs NDB’ that represents the
same set U-DB.
The plot shows how the complexity
of a database changes after
applying the morph operator.
Download