CPS110: General security Landon Cox

advertisement
CPS110:
General security
Landon Cox
Intro: general security
 Hardware reality
 Processor, memory, disks, NIC
 Components can be used by anyone
 OS abstraction
 Controlled access to hardware
Security abstractions
 What HW primitives exist for access control?
 Processor mode bit (kernel/user mode)
 Protected instructions (I/O instructions, halt)
 Protected data (page tables, interrupt vector)
 Want to build two abstractions on these
 Identity (authentication)
 Who are you?
 Security policy (authorization)
 What are you allowed to do?
Building up from the hardware
 One of the themes of this class
 Threads
 Atomic test&set, interrupt enable/disable
 Transactions
 Single disk sector write
 Reliable communication
 Atomic packet send
Building up from the hardware
 Already built on top of HW security primitives
 Secure sharing of physical memory
 Use protection between address spaces
 (fault on unauthorized access)
 Secure sharing of kernel services
 Use system calls to safely transition to privileged mode
 (trap after syscall instruction)
Authentication: who are you?
 Prove your identity to the OS
 Many ways to authenticate
1. Passwords
2. Physical tokens
3. Biometrics
Password authentication
 Password
 Shared secret between you and the OS
 Several weaknesses
 How should we store passwords?
 Cryptographic hashes (MD5, SHA)
 Hashes are one-way functions
 Check that hash(input) = hash(password)
Cryptographic hashes
SHA1 hash
“Hash digest”
Arbitrarily large
160 bits
 Infeasible to reverse (even for the system!)
 I.e. cannot compute plaintext from digest
 Collision-resistant hash function
 Probability (collision) < probability (HW error)
 Extremely useful tool in lots of domains
 Can be used as a shorthand for naming objects (e.g. files)
Password authentication
 What’s the weakest link in password systems?
 People!
 People choose short passwords
 (brute force attacks take less time)
 People choose easy-to-remember passwords
 (narrows the search space: dictionary attack)
 People choose the same password for everything
 (break the weakest web site, gain access to the strongest)
 How could you use cryptographic hashes to solve this?
 User remembers one password, systems sends Hash(pass + site name)
Weak passwords
 Consider an 8-character password
 256^8 possible passwords
 2^64 ~ 16 * billion * billion
 If you only choose lower-case letters
 26^8 ~ 200 billion
 If you choose a word from the dictionary
 320,000 words in Webster’s unabridged
 1000/second  find a password in 5 minutes
 Researchers ran an attack on Michigan passwords
 Recovered thousands; some of the most popular: “beer”, “hockey”
 Similar result at Berkeley CS in the 80s: “gandolph”
Physical token authentication
 Require something physical
 E.g. a ticket to a dance performance
 What if your token is stolen/forged?
 Require a physical token + password
 E.g. your ATM card + PIN
Biometric authentication
 Essentially a physical token
 One that should be hard to steal/forge
 Examples
 Thumbprint, retina scan, signature
 Most have accuracy problems
 False positives (accept invalid person)
 False negatives (reject valid person)
Real-world authentication
 How to authenticate to your credit card company?
 Give them your social security number
 They ask over the phone and check
 How is this vulnerable?
 Company has my SSN, so they can pretend to be me
 (or someone who breaks into them can pretend to be me)
 Phone line snoop can pretend to be me
Authorization: what can you do?
 Fundamental structure
 Access control matrix
 Rows = authentication domains
 Columns = objects
File 1
File 2
File 3
User 1
RW
RW
RW
User 2
--
R
RW
 Two approaches to storing data:
 Access control lists and capabilities
Access control lists
 Standard for file systems
 At each object
 Store who can access the object
 Store how the object can be accessed
 On each access
 Check that the user has proper permissions
 Use groups to make things more convenient
 E.g. forbes, chase, and lpcox are all in group “prof”
Access control lists
 How can you defeat ACL authorization?
 Villain convinces the OS it is someone else
 Example




Sendmail runs as root (full privileges)
Attacker compromises sendmail process
Can now run arbitrary code under root ID
This is what you will be doing in Project 3
 What if you get the ACL wrong?
Frighteningly common
 “Usability and privacy: a study of Kazaa…”
 Good and Krekelberg, SigCHI 2003
 In 12 hours, found 150 inboxes on Kazaa
 Observed people downloading them
 Many other examples: web, NFS, etc
 People don’t understand software interactions
 Affects even diligent users …
Data breach
Snoop
Despite her best efforts,
Alice’s data is only as
secure as her least
competent confidant.
Alice
Bob
What to do?
 Two goals
1. Identify sensitive files
2. Infer who is allowed to view them
 Many more constraints
1. Don’t impact performance
2. Don’t bother the user
3. Remain backwards-compatible
Identifying sensitive files
 Approach: identify the common case
 I download sensitive docs from many sources
mail.cs.duke.edu
What do the examples have in common? Encryption!
Identifying sensitive files
 Approach: identify the common case
 I download sensitive docs from many sources
 Insight: data is often encrypted for a reason
 In our examples, servers allow clients to cache sensitive file
 To protect those files in the network, communication is encrypted
 How can we tell if network communication is encrypted?
 Use the port (e.g., 443 is used for HTTPS)
 Ciphertext exhibits high “entropy” or randomness
 Entropy = information density, measured in bits/byte
 Can inexpensively measure the entropy of every socket
 If data stream exhibits high entropy, see if “resulting data” hits FS
 If so, tag the file as sensitive
Download entropy scores
Bits-per-byte
Plaintext
HTTPS
8
7
6
5
4
3
2
1
0
Gradesheet
Email
Bank
Tax return
statement
How can we distinguish compression from encryption?
if compressed format, use data-similarity
if actual compression, measure relative data volumes
Image
Capabilities
 At each user
 Store a list of objects they may access
 Store how those objects may be accessed
 Example
 At user2, store ‹file2, r: file3, rw›
 On each access
 User presents capability
 Capability proves they are authorized
Capabilities
 Possession of capability provides authorization
 Like car keys
 If you have the door key, you can get in the car
 If you have the ignition key, you can drive it
 How can you defeat a capability system?
 Forge an authorized user’s capabilities
 Like making a copy of someone’s car keys
Capabilities
 Solution to forged capabilities?
 “Self-authenticating” capabilities
 E.g. include {user, filename}system-key
 Mastercard number, expiration date
 Have these, you can charge to account
 What makes these hard to forge?
 Random numbers in a sparse space
Revocation
 How to revoke permissions with an ACL?
 Just delete/change the ACL entry
 How to revoke permissions with capabilities?
 Not nearly as easy
 Could change the car door’s lock
 This revokes access for everyone with the key
Dealing with security
 Principle of least privilege
 Users get minimum privilege to do their work
 Defense in depth
 Create multiple levels
 Lower-levels monitor higher ones
 Minimize trusted computing base




Every system has a trusted part
(e.g. the thing that checks the ACL)
This part has special privileges
It cannot be subject to the security mechanism
Reducing the trusted base
 Why reduce the size of the trusted base?
 Small and simple = easier to reason about
 Hopefully fewer bugs and security loopholes
 Most systems
 Kernel is trusted
 Special user (root, Administrator) is trusted
 Emerging systems
 Virtual machines: trust the hypervisor
 Trusted Platform Modules: hardware attests to software
Common attacks
 Hidden channels
 Buffer overflow
 Trojan horse
Hidden channels
 Get system to communicate in unintended ways
 Example: tenex (supposedly secure OS)
 Created a team to break in
 Team had all passwords within 48 hours … oops.
Password checker
for (i=0; i<8; i++) {
if (input[i] != password[i]) {
break;
}
}
 Goal: require 256^8 tries to see if password is right
Hidden channels: tenex
Password checker
for (i=0; i<8; i++) {
if (input[i] != password[i]) {
break;
}
}
 How to break? (hint: vm faults are visible)




Specially arrange the input’s layout in memory
Force a page fault if second character is read
If you get a fault, the first character was right
Do again for third, fourth, … eighth character
 Can check the password in 256*8 tries
Buffer overflow
 Suppose you have an on-stack buffer
 You read input into the buffer
 You don’t check the length of the input
 A villain’s long input can corrupt the stack
 If they’re smart, they can corrupt the return value
 Allows villain to jump to their own code
 First Internet worm did this
 Creator was a Cornell grad student, Robert Morris
 Son of a famous computer security guru
 He is now a professor at MIT
Trojan horse
 Basic idea
 Give somebody something useful
 Make it do something evil
 Examples




Replace login with something that emails passwords
Send a Word document with a malicious macro
Download Kazaa and install spyware
Install a Sony CD and have it install a buggy DRM root-kit
Why security is so hard
 Ken Thompson’s self-replicating code
 Goal: put a backdoor into login
 Allow “ken” to login as root w/o password
1.Make it possible (easy)
2.Hide it (tricky)
Login backdoor
 Step 1: modify login.c
 (code A) if (name == “ken”) login as root
 This is obvious so how do we hide it?
 Step 2: modify C compiler
 (code B) if (compiling login.c) compile A into binary
 Removes code A from login.c, keeps backdoor
 This is now obvious in the compiler, how do we hide it?
 Step 3: distribute a buggy C compiler binary
 (code C) if (compiling C compiler) compile code B into binary
 No trace of attack in any source code (only in C compiler’s binary)
 Upshot: everything is bootstrapped, everything trusts something
Conficker
Conficker and DNS
 Problem: how to update your malware?
 Why not just host at a web site, have malware check?
 Easy to catch
 Monitor malware, see which web sites they connect to
 Follow the paper trail from that site
 Instead, each malware client computes random DNS names
Conficker and DNS
The name of each generated domain is 4 to 10 characters, to which a randomly selected TLD is
appended from the following list of 116 suffix (mapping to 110 TLDs):
[ "ac" , "ae" , "ag" , "am" , "as" , "at" , "be" , "bo" , "bz" , "ca" , "cd" , "ch" , "cl" , "cn" , "co.cr" ,
"co.id" , "co.il" , "co.ke" , "co.kr" , "co.nz" , "co.ug" , "co.uk" , "co.vi" , "co.za" , "com.ag" ,
"com.ai" , "com.ar" , "com.bo" , "com.br" , "com.bs" , "com.co" , "com.do" , "com.fj" , "com.gh"
, "com.gl" , "com.gt" , "com.hn" , "com.jm" , "com.ki" , "com.lc" , "com.mt" , "com.mx" ,
"com.ng" , "com.ni" , "com.pa" , "com.pe" , "com.pr" , "com.pt" , "com.py" , "com.sv" , "com.tr"
, "com.tt" , "com.tw" , "com.ua" , "com.uy" , "com.ve" , "cx" , "cz" , "dj" , "dk" , "dm" , "ec" , "es"
, "fm" , "fr" , "gd" , "gr" , "gs" , "gy" , "hk" , "hn" , "ht" , "hu" , "ie" , "im" , "in" , "ir" , "is" , "kn" ,
"kz" , "la" , "lc" , "li" , "lu" , "lv" , "ly" , "md" , "me" , "mn" , "ms" , "mu" , "mw" , "my" , "nf" , "nl"
, "no" , "pe" , "pk" , "pl" , "ps" , "ro" , "ru" , "sc" , "sg" , "sh" , "sk" , "su" , "tc" , "tj" , "tl" , "tn" ,
"to" , "tw" , "us" , "vc" , "vn" ]
 Checks to see if they can download from site until it hits
 Why is it doing this?
 What aspects of DNS does this take advantage of?
Coming up
 More security (project 3)
 Storage and files
 Google
Download