Immunology and Computer Security Steven Hofmeyr, Anil Somayaji

Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM http://cs.unm.edu/~forrest forrest@cs.unm.edu Introduction • Goal: Develop new approaches to data security and privacy that incorporate design principles from living systems: – – – – Survivability and evolvability Autonomy Robustness, adaptation and self repair Diversity • Extends earlier work on computational properties of the immune system: – Intrusion detection – Automated response – Collaborative information filtering Project Overview • Immunology and data: – Negative representations of information • Epidemiology and the Internet: – Social networks matter – The real world is not always scale free • The social utility of privacy: – Why is privacy an important value in democratic societies? – Evolutionary perspective Collaborations • • • • • Paul Helman and Cris Moore (UNM) Robert Axelrod and Mark Newman (Univ. Michigan) Matthew Williamson (Sana Security) Rebecca Wright and Michael de Mare (Stevens) Joan Feigenbaum and Avi Silberschatz (Yale) – Fernando Esponda’s post-doc next year. How the Immune System Distributes Detection • • Many small detectors matching nonself (negative detection). Each detector matches multiple patterns (generalization). • Advantages of distributed negative detection: – – – – Localized (no communication costs) Scalable and tunable Robust (no single point of failure) Private Applications to Computing • • • • Anomaly detectors Information filters Adaptive queries Negative representations earlier work earlier work future in progress – A positive set DB is a set of fixed length strings. – A negative set NDB represents all the strings not in DB. – Intuition: If an adversary obtains a string from NDB, little information is revealed. Example: – – – – U= All possible four character strings DB={juan, eric, dave} U-DB={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…} There are 264-3= 456973 strings in U-DB. Results • Can U-DB be represented efficiently, given |U-DB| >> |DB| ? – YES: There is an algorithm that creates an NDB of size polynomial in DB. – Strategy: Compress information using don’t care symbol. Other representations? DB U-DB NDB 000 001 01* 101 010 0*1 111 011 1*0 100 110 • What properties does the representation have? – Membership queries are tractable (linear time even without indexing). • Other queries, information leakage are future work. – Inferring information from a subset of NDB (next slide). – Inferring DB from NDB is NP-Hard (note: not doing crypto): • • • • Currently investigating instance difficulty. Algorithms for increasing instance difficulty. On-line insert/delete algorithms preserve problem difficulty. Collaborations with R. Wright, M. de Mare, and C. Moore. What information is revealed by queries? (without assuming irreversibility) • Having access to a subset of NDB (or DB) yields some information about strings outside that subset: – • Assume NDB (or DB) is partitioned into n subsets. To the query “Is x in DB,” what do I learn about x if x is not in my subset? – – – Must consult n subsets of NDB to conclude that x is in DB. Must consult the subsets only until x is found (on average n/2). Assumes that we care more about DB than U-DB. Probability and information content as the membership of strings is revealed. DB contains 10% of all possible L-length strings (formulas). Private Set Intersection • Determine which records are in the intersection of several databases i.e. – DB1  DB2  …  DBn – (NDB1  NDB2  …  NDBn) • Each party may compute the intersection – DBi  (NDB1  NDB2  …  NDBn) • Party i learns only the intersection of all the sets, • And not the cardinality of the other sets. Results cont. • How might these properties be useful? – – – – – – Protect data from insider attacks Computing set intersections Surveys involving sensitive information Anonymous digital credentials Fingerprint databases Other ideas? • Prototype implementations: – Perl, C – http://esa.ackleyshack.com/ndb – See demo Computer Epidemiology Justin Balthrop, Mark Newman, Matt Williamson 300 IP network Adminstrator network 250 Email traffic Address books 10000 200 1000 150 100 100 10 50 0 1 0 100 200 300 400 1 Degree k 10 100 1000 Degree k Science 304:527-529 (2004) • Information spreads over networks of social contacts between computers: – – • Network topology affects the rate and extent of spreading: – • Email address books. URL links. Epidemiological models, and the epidemic threshold. Controlling spread on scale-free networks: – – – Random vaccination is ineffective (e.g., anti-virus software). Targeted vaccination of high-connectivity nodes. Control degree distribution in time rather than space. The Social Utility of Privacy Robert Axelrod and Ryan Gerety • Typical framing: – Privacy values should remain as is (e.g., Lessig). – Individual rights vs. state (i.e., civil liberties vs. community safety / crime). • A community may have its own interest in defending individual privacy (and not), independent of the civil liberties argument: – To promote innovation in changing environments. – To cope with distortions (e.g., overconfidence of middle managers). – To compensate for overgeneralized norms. • Not necessarily advocating more privacy: – From a societal/informational point of view how should appropriate bounds on privacy be determined? • Current status: – Exploratory modeling based on simple games. Next Steps: Negative Representations • • • Distributed negative representations Leaking partial information Relational algebra operators on the negative database: – Select, join, etc. • Instance difficulty: – Hiding given satisfying assignments in a SAT formula – Approximate representations – Other representations? • • More realistic implementations Negative data mining: – Is it easier/harder to find certain instances in NDB? • Imprecise representations: – Partial matching and queries – Learning algorithms People Stephanie Forrest Paul Helman Fernando Esponda Elena Ackley Publications • • • • • • F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.'' International Journal of Information Security (submitted March 2005). F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman `Òn-line negative databases.'' Journal of Unconventional Computing (in press). F. Esponda, S. Forrest, and P. Helman. `À formal framework for positive and negative detection.'' IEEE Transactions on Systems, Man, and Cybernetics 34:1 pp. 357-373 (2004). J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the spread of computer viruses.'’ Science 304:527-529 (2004). H. Inoue and S. Forrest `Ìnferring Java security policies through dynamic sandboxing.'' "2005 International Conference on Programming Languages and Compilers (PLC'05) (in press). F. Esponda, E. Ackley, S. Forrest, and P. Helman. `Òn-line negative databases.'' Third International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004). SUPPLEMENTARY MATERIAL   Probabilities F1  P(x  DB | x  NDB fj )  | DB | |U |  | NDB fi | | DB |  | DB fj | F2  P(x  DB | x  DB fj )  |U |  | DB fj | HN (x)  F1 log 2 F1  (1 F1)log 2 (1 F1) HP (x)  F2 log 2 F2  (1 F2 )log 2 (1 F2 )  BACK Generating Hard-to-Reverse Negative Databases Instance Difficulty (l=64) 900 • • The randomized algorithm can be used to create a negative database. Insert/Delete operations turn known hard formulas into negative databases. The Morph operator may be used to search for hard instances. 700 600 500 Decisions 400 300 200 100 0 1 2 3 4 5 6 7 8 Specified bits per record (k-SAT) Instance Difficulty (Glassy8 formula l=64) 60000 50000 Decisions (zchaff) • Decisions (zchaff) 800 40000 Original NDB 30000 Updated NDB 20000 10000 0 1 2 3 4 5 6 7 8 Specified bits per record (k-SAT) H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable formulas” SAT 2004. Effect of the Morph operation • • The Morph operation takes as input a negative database NDB and outputs NDB’ that represents the same set U-DB. The plot shows how the complexity of a database changes after applying the morph operator.

Immunology and Computer Security Steven Hofmeyr, Anil Somayaji

Related documents

Products

Support

Immunology and Computer Security Steven Hofmeyr, Anil Somayaji

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib