CPS110: General security Landon Cox Intro: general security Hardware reality Processor, memory, disks, NIC Components can be used by anyone OS abstraction Controlled access to hardware Security abstractions What HW primitives exist for access control? Processor mode bit (kernel/user mode) Protected instructions (I/O instructions, halt) Protected data (page tables, interrupt vector) Want to build two abstractions on these Identity (authentication) Who are you? Security policy (authorization) What are you allowed to do? Building up from the hardware One of the themes of this class Threads Atomic test&set, interrupt enable/disable Transactions Single disk sector write Reliable communication Atomic packet send Building up from the hardware Already built on top of HW security primitives Secure sharing of physical memory Use protection between address spaces (fault on unauthorized access) Secure sharing of kernel services Use system calls to safely transition to privileged mode (trap after syscall instruction) Authentication: who are you? Prove your identity to the OS Many ways to authenticate 1. Passwords 2. Physical tokens 3. Biometrics Password authentication Password Shared secret between you and the OS Several weaknesses How should we store passwords? Cryptographic hashes (MD5, SHA) Hashes are one-way functions Check that hash(input) = hash(password) Cryptographic hashes SHA1 hash “Hash digest” Arbitrarily large 160 bits Infeasible to reverse (even for the system!) I.e. cannot compute plaintext from digest Collision-resistant hash function Probability (collision) < probability (HW error) Extremely useful tool in lots of domains Can be used as a shorthand for naming objects (e.g. files) Password authentication What’s the weakest link in password systems? People! People choose short passwords (brute force attacks take less time) People choose easy-to-remember passwords (narrows the search space: dictionary attack) People choose the same password for everything (break the weakest web site, gain access to the strongest) How could you use cryptographic hashes to solve this? User remembers one password, systems sends Hash(pass + site name) Weak passwords Consider an 8-character password 256^8 possible passwords 2^64 ~ 16 * billion * billion If you only choose lower-case letters 26^8 ~ 200 billion If you choose a word from the dictionary 320,000 words in Webster’s unabridged 1000/second find a password in 5 minutes Researchers ran an attack on Michigan passwords Recovered thousands; some of the most popular: “beer”, “hockey” Similar result at Berkeley CS in the 80s: “gandolph” Physical token authentication Require something physical E.g. a ticket to a dance performance What if your token is stolen/forged? Require a physical token + password E.g. your ATM card + PIN Biometric authentication Essentially a physical token One that should be hard to steal/forge Examples Thumbprint, retina scan, signature Most have accuracy problems False positives (accept invalid person) False negatives (reject valid person) Real-world authentication How to authenticate to your credit card company? Give them your social security number They ask over the phone and check How is this vulnerable? Company has my SSN, so they can pretend to be me (or someone who breaks into them can pretend to be me) Phone line snoop can pretend to be me Authorization: what can you do? Fundamental structure Access control matrix Rows = authentication domains Columns = objects File 1 File 2 File 3 User 1 RW RW RW User 2 -- R RW Two approaches to storing data: Access control lists and capabilities Access control lists Standard for file systems At each object Store who can access the object Store how the object can be accessed On each access Check that the user has proper permissions Use groups to make things more convenient E.g. forbes, chase, and lpcox are all in group “prof” Access control lists How can you defeat ACL authorization? Villain convinces the OS it is someone else Example Sendmail runs as root (full privileges) Attacker compromises sendmail process Can now run arbitrary code under root ID This is what you will be doing in Project 3 What if you get the ACL wrong? Frighteningly common “Usability and privacy: a study of Kazaa…” Good and Krekelberg, SigCHI 2003 In 12 hours, found 150 inboxes on Kazaa Observed people downloading them Many other examples: web, NFS, etc People don’t understand software interactions Affects even diligent users … Data breach Snoop Despite her best efforts, Alice’s data is only as secure as her least competent confidant. Alice Bob What to do? Two goals 1. Identify sensitive files 2. Infer who is allowed to view them Many more constraints 1. Don’t impact performance 2. Don’t bother the user 3. Remain backwards-compatible Identifying sensitive files Approach: identify the common case I download sensitive docs from many sources mail.cs.duke.edu What do the examples have in common? Encryption! Identifying sensitive files Approach: identify the common case I download sensitive docs from many sources Insight: data is often encrypted for a reason In our examples, servers allow clients to cache sensitive file To protect those files in the network, communication is encrypted How can we tell if network communication is encrypted? Use the port (e.g., 443 is used for HTTPS) Ciphertext exhibits high “entropy” or randomness Entropy = information density, measured in bits/byte Can inexpensively measure the entropy of every socket If data stream exhibits high entropy, see if “resulting data” hits FS If so, tag the file as sensitive Download entropy scores Bits-per-byte Plaintext HTTPS 8 7 6 5 4 3 2 1 0 Gradesheet Email Bank Tax return statement How can we distinguish compression from encryption? if compressed format, use data-similarity if actual compression, measure relative data volumes Image Capabilities At each user Store a list of objects they may access Store how those objects may be accessed Example At user2, store ‹file2, r: file3, rw› On each access User presents capability Capability proves they are authorized Capabilities Possession of capability provides authorization Like car keys If you have the door key, you can get in the car If you have the ignition key, you can drive it How can you defeat a capability system? Forge an authorized user’s capabilities Like making a copy of someone’s car keys Capabilities Solution to forged capabilities? “Self-authenticating” capabilities E.g. include {user, filename}system-key Mastercard number, expiration date Have these, you can charge to account What makes these hard to forge? Random numbers in a sparse space Revocation How to revoke permissions with an ACL? Just delete/change the ACL entry How to revoke permissions with capabilities? Not nearly as easy Could change the car door’s lock This revokes access for everyone with the key Dealing with security Principle of least privilege Users get minimum privilege to do their work Defense in depth Create multiple levels Lower-levels monitor higher ones Minimize trusted computing base Every system has a trusted part (e.g. the thing that checks the ACL) This part has special privileges It cannot be subject to the security mechanism Reducing the trusted base Why reduce the size of the trusted base? Small and simple = easier to reason about Hopefully fewer bugs and security loopholes Most systems Kernel is trusted Special user (root, Administrator) is trusted Emerging systems Virtual machines: trust the hypervisor Trusted Platform Modules: hardware attests to software Common attacks Hidden channels Buffer overflow Trojan horse Hidden channels Get system to communicate in unintended ways Example: tenex (supposedly secure OS) Created a team to break in Team had all passwords within 48 hours … oops. Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break; } } Goal: require 256^8 tries to see if password is right Hidden channels: tenex Password checker for (i=0; i<8; i++) { if (input[i] != password[i]) { break; } } How to break? (hint: vm faults are visible) Specially arrange the input’s layout in memory Force a page fault if second character is read If you get a fault, the first character was right Do again for third, fourth, … eighth character Can check the password in 256*8 tries Buffer overflow Suppose you have an on-stack buffer You read input into the buffer You don’t check the length of the input A villain’s long input can corrupt the stack If they’re smart, they can corrupt the return value Allows villain to jump to their own code First Internet worm did this Creator was a Cornell grad student, Robert Morris Son of a famous computer security guru He is now a professor at MIT Trojan horse Basic idea Give somebody something useful Make it do something evil Examples Replace login with something that emails passwords Send a Word document with a malicious macro Download Kazaa and install spyware Install a Sony CD and have it install a buggy DRM root-kit Why security is so hard Ken Thompson’s self-replicating code Goal: put a backdoor into login Allow “ken” to login as root w/o password 1.Make it possible (easy) 2.Hide it (tricky) Login backdoor Step 1: modify login.c (code A) if (name == “ken”) login as root This is obvious so how do we hide it? Step 2: modify C compiler (code B) if (compiling login.c) compile A into binary Removes code A from login.c, keeps backdoor This is now obvious in the compiler, how do we hide it? Step 3: distribute a buggy C compiler binary (code C) if (compiling C compiler) compile code B into binary No trace of attack in any source code (only in C compiler’s binary) Upshot: everything is bootstrapped, everything trusts something Conficker Conficker and DNS Problem: how to update your malware? Why not just host at a web site, have malware check? Easy to catch Monitor malware, see which web sites they connect to Follow the paper trail from that site Instead, each malware client computes random DNS names Conficker and DNS The name of each generated domain is 4 to 10 characters, to which a randomly selected TLD is appended from the following list of 116 suffix (mapping to 110 TLDs): [ "ac" , "ae" , "ag" , "am" , "as" , "at" , "be" , "bo" , "bz" , "ca" , "cd" , "ch" , "cl" , "cn" , "co.cr" , "co.id" , "co.il" , "co.ke" , "co.kr" , "co.nz" , "co.ug" , "co.uk" , "co.vi" , "co.za" , "com.ag" , "com.ai" , "com.ar" , "com.bo" , "com.br" , "com.bs" , "com.co" , "com.do" , "com.fj" , "com.gh" , "com.gl" , "com.gt" , "com.hn" , "com.jm" , "com.ki" , "com.lc" , "com.mt" , "com.mx" , "com.ng" , "com.ni" , "com.pa" , "com.pe" , "com.pr" , "com.pt" , "com.py" , "com.sv" , "com.tr" , "com.tt" , "com.tw" , "com.ua" , "com.uy" , "com.ve" , "cx" , "cz" , "dj" , "dk" , "dm" , "ec" , "es" , "fm" , "fr" , "gd" , "gr" , "gs" , "gy" , "hk" , "hn" , "ht" , "hu" , "ie" , "im" , "in" , "ir" , "is" , "kn" , "kz" , "la" , "lc" , "li" , "lu" , "lv" , "ly" , "md" , "me" , "mn" , "ms" , "mu" , "mw" , "my" , "nf" , "nl" , "no" , "pe" , "pk" , "pl" , "ps" , "ro" , "ru" , "sc" , "sg" , "sh" , "sk" , "su" , "tc" , "tj" , "tl" , "tn" , "to" , "tw" , "us" , "vc" , "vn" ] Checks to see if they can download from site until it hits Why is it doing this? What aspects of DNS does this take advantage of? Coming up More security (project 3) Storage and files Google