18739A: Foundations of Security and Privacy Course Review Anupam Datta CMU Fall 2007-08 Goals of course Provide an overview of foundational work in security and privacy Self-contained introduction + State-of-the-art research Fundamental questions What does being secure mean? Model of system + attacker Is a given system secure? Sound analysis methods Goals of course (2) Cover 4 central research areas Security Protocols Distributed Access Control Privacy Language-based Security An experiment – existing courses typically focus on one area Goals of course (3) Introduction to general analysis methods Model-checking Logics Process calculi Logic programming Type systems Application to practical security mechanisms Industrial security protocols Grey system for distributed access control Specification and enforcement of privacy laws such as HIPAA in LPU Cyclone (Safe C) and Jif (Java + information flow) Goals of course (4) Provide breadth in area Lectures and homeworks Provide some depth in area Course project Largely successful! Four broad topics 1. Security Protocols 2. Distributed Access Control 3. Privacy 4. Language-based Security Security Protocol Analysis The Problem: Is a given network protocol secure? First define: Model of protocol Model of attacker Security properties Secrecy, confidentiality Authentication, integrity Denial of service Methods Bug finding Automated model-checking techniques Finite number of sessions Security proofs Absence of bugs Unbounded number of sessions Many approaches Paulson’s Inductive Method, Protocol Composition Logic, Applied Pi Calculus Modeling Cryptography Symbolic Model “Perfect crypto”: No attacker can break, e.g. can decrypt encrypted message iff has decryption key Proof technique: Induction Complexity-theoretic Model Primitives secure with high probability against probabilistic polynomial time attackers Proof technique: Reduction Recent work combining methods Specifying security Trace-based Every execution satisfies desired security property Model-checking, inductive method, PCL Equivalence-based Real protocol indistinguishable from “ideal” protocol Applied pi calculus (observational equivalence), cryptography (pseudorandomness, …) Example: Authentication Authentication protocol AB BA AB {i}k {i+1}k “Ok” “Ideal” protocol AB BA BA AB {random1}k {random2}k random1, random2 on a magic secure channel “Ok” if numbers on real & magic channels match Real protocol is secure if it is observationally equivalent to ideal protocol Course Projects Rivest’s 3 Ballot Voting Protocol Ryan’s Pret-a-Voter Protocol Verified by Visa Tor Anonymity Protocol Four broad topics 1. Security Protocols 2. Distributed Access Control 3. Privacy 4. Language-based Security Access Control Picture Distributed Access Control •Requestor and monitor on different machines •Policy distributed across different servers We covered Access control logics Lampson et al “speaks-for” logic Proof Carrying Authorization (PCA) and the Grey System Constructive Authorization Logic Trust Management RT – Role-based Trust Management EPub Alice Grants access to university students Trusts universities to certify students Trusts ABU to certify universities Alice is a student StateU is a university ABU StateU Main issues How to represent policies Naming, delegation Syntax of logic/language (Lampson+, PCA, Constructive Logic, RT) How to reason by combining policies Proof system for logics Algorithms for RT (decision procedures for Datalog) How to collect relevant credentials Distributed proof-search using heuristics in Grey Provably correct credential chain discovery in RT Four broad topics 1. Security Protocols 2. Distributed Access Control 3. Privacy 4. Language-based Security Privacy Research Space What is Privacy? [Philosophy, Law, Public Policy] Formal Model, Policy Language, Compliance-check Algorithms [Programming Languages, Logic] Implementation-level Compliance [Software Engg, Formal Methods] Data Privacy [Databases, Cryptography] Privacy Scenarios: Enterprises collect personal information – email and postal addresses – in many cases through web sites Organizations such as hospitals and financial institutions hold sensitive personal information Fundamental questions: Policy: Under what conditions is the collected information used and distributed? Enforcement: Do organizational processes actually enforce the stated policy? Privacy Laws: HIPAA, GLBA, COPPA Privacy Policy Languages P3P Privacy policy specification for web sites. E-P3P/EPAL Enterprise privacy policy specification and enforcement Contextual Integrity and LPU Philosophical theory of privacy Formalization in temporal logic (specification and enforcement) Expressing privacy laws, e.g. HIPAA, GLBA, COPPA Contextual Integrity [N2004] Philosophical framework for privacy Central concept: Context Examples: Healthcare, banking, education What is a context? Set of interacting agents in roles Roles in healthcare: doctor, patient, … Norms of transmission Doctors should share patient health information as per the HIPAA rules Purpose Improve health Expressing Privacy in LPU Allow message transmission if: •at least one positive norm is satisfied; and •all negative norms are satisfied HIPAA – Healthcare Privacy •HIPAA consists primarily of positive norms: share phi if some rule explicitly allows it (2), (3), (5), (6) •Exception: negative norm about psychotherapy notes (4) COPPA – Children Online Privacy •COPPA consists primarily of negative norms •children can share their protected info only if parents consent (7) (condition) •(8) (obligation – future requirements) Sanitization of Databases Add noise, delete names, etc. Real Database (RDB) Sanitized Database (SDB) • Health records • Protect privacy • Census data • Provide useful information (utility) Re-identification by linking • Linking two sets of data on shared attributes may uniquely identify some individuals: • Example [Sweeney] : De-identified medical data was released, purchased Voter Registration List of MA, re-identified Governor • 87 % of US population uniquely identifiable by 5-digit ZIP, sex, dob K-anonymity (1) Quasi-identifier: Set of attributes (e.g. ZIP, sex, dob) that can be linked with external data to uniquely identify individuals in the population Make every record in the table indistinguishable from at least k-1 other records with respect to quasiidentifiers Linking on quasi-identifiers yields at least k records for each possible value of the quasi-identifier K-anonymity and beyond • Provides some protection: linking on ZIP, age, nationality yields 4 records • Limitations: lack of diversity in sensitive attributes, background knowledge, subsequent releases on the same data set • Utility: less suppression implies better utility Four broad topics 1. Security Protocols 2. Distributed Access Control 3. Privacy 4. Language-based Security Type Systems for Security Focus on the use of type systems to improve software security Two representative projects Cyclone: Memory safe dialect of C, i.e. no buffer overflow attacks, format string vulnerabilities etc (or Ccured) Jif: Enforcing information flow security properties (non-interference and variants) Definition of Security Non-interference (idea) HO’ HO HI’ HI LI Program LO No information flows from high inputs to low outputs Example if x = 1 then y:=1 else y:=0 x y NI H H Yes L L Yes H L No L H Yes Language definition Syntax Type system (static semantics) Operational semantics (dynamic semantics) Type safety (soundness) theorem Soundness Theorem What next? Security courses@CMU 18730 – Introduction to Computer Security 18731 – Network Security Not much overlap, except network security protocols 18732 - Secure Software Systems Complementary course on software security 18733 – Applied Cryptography Some overlap in topics; presentation focuses more on attacks and mechanisms, not security models and analysis Complementary course; details of crypto that we treated as black boxes (offered next semester) 15-819 - Languages and Logics for Security Reading seminar focused primarily on language-based security The End