protection

advertisement
Protection, identity, and trust
Jeff Chase
Duke University
Processes and you
• Your processes act on your behalf.
• They represent you to the outside world.
• They can spy on you.
• Why do you trust them?
• What is “trust” anyway?
• How might an attacker exploit your trust?
• What about servers?
The need for access control
• Processes run on behalf of users. (“subjects”)
• Processes create/read/write/delete files. (“objects”)
• The OS kernel mediates these accesses.
• How should the kernel determine which subjects can
access which objects?
This problem is called access control
or authorization (“authz”). It also
encompasses the question of who is
authorized to make a given statement.
The concepts are general, but we can
consider Unix as an initial example.
Concept: reference monitor
Reference monitor
Example: Unix kernel
Isolation boundary
Principles of Computer System Design  Saltzer & Kaashoek 2009
More terminology
• A reference monitor monitors references to its objects…
• …by a subject (one or more), also called a principal.
– I generally reserve the term “principal” for an entity with
responsibility and accountability in the real world (e.g.: you).
– A subject identity may be a program speaking for a user, which is
distinct from the user herself.
• The reference monitor must determine the true and
correct identity of the subject.
– This is called authentication. Example: system login
– We’ll return to this later. In Unix, once a user is logged in, the
kernel maintains the identity across chained program executions.
– Things are different on a network, where there is no single trusted
kernel to anchor all interactions.
Reference monitor
requested
operation
“boundary”
protected
state/objects
identity
subject
program
guard
What is the nature of the isolation boundary?
If we’re going to post a guard, there should also be a wall.
Otherwise somebody can just walk in past the guard, right?
Example: Unix kernel
• The Unix kernel is an example of a reference monitor.
– Reference monitor is a general concept.
– We’re going to keep most of the discussion general and abstract.
The kernel protects all objects
accessed by user processes
through the syscall API.
How does the kernel protect
itself? What is the boundary?
Concept: access control matrix
We can imagine the set of all allowed accesses for
all subjects over all objects as a huge matrix.
obj1
obj2
Alice
RW
---
Bob
R
RW
How is the matrix stored?
Concept: ACLs and capabilities
How is the matrix stored?
ACL
obj1
obj2
Alice
RW
---
Bob
R
RW
capability list
• Capabilities: each subject holds a list of its rights and presents
them as proof of access rights. A capability is an unforgeable
token that confers rights to access a specific object.
• Access control list (ACL): each object stores a list of subjects
permitted to access it.
Concept: roles or groups
roles, groups
wizard
objects
student
obj1
obj2
Alice
yes
no
wizard
RW
---
Bob
no
yes
student
R
RW
• A role or group is a named set S of subjects. Or, equivalently, it
is a named boolean predicate that is true for s iff s  S.
• Roles/groups provide a level of indirection that simplifies the
access control matrix, and makes it easier to store and manage.
Attributes and policies
name = “Alice”
wizard = true
Name = “obj1”
access: wizard
name = “Bob”
student = true
Name = “obj2”
access: student
• More generally, we may view a subject/object identity as a list
of named attributes with typed values (e.g., boolean).
• Generally, an access policy for a requested access is a
boolean (logic) function over the subject and object identities.
Access control: general model
• We use this simple model for identity and authorization.
• Real-world access control schemes use many variants of
and restrictions of the model, with various terminology.
– Role-Based Access Control (RBAC)
– Attribute-Based Access Control (ABAC)
• A guard is the code that checks access for an object, or
(alternatively) the access policy that the code enforces.
• There are many extensions. E.g., some roles are nested (a
partial order or lattice). The guard must reason about them.
– Security clearance level TopSecret  Secret [subset]
– Equivalently: TopSecret(s)  Secret(s) [logical implication]
Identity in an OS (Unix)
• Every process has a security label that
governs access rights granted to it by kernel.
• Abstractly: the label is a list of named
attributes and values. An OS defines a
space of attributes and their interpretation.
• Some attributes and values represent an
identity bound to the process.
• Unix: e.g., userID: uid
uid=“alice”
Every Unix user account
has an associated userID
(uid). (an integer)
credentials
uid
There is a special administrator uid==0 called root or superuser or su.
Root “can do anything”: normal access checks do not apply to root.
Unix: identity and access
• A Unix program running in a process
with Alice’s identity can do anything
that Alice herself can do.
• In particular, a process running as
Alice can access any of Alice’s files.
– Saved passwords, security keys, etc.
– Programs and their configuration files
– Browser history, cookies…
• How does Alice sleep at night?
• How does the system determine what
program executions can run as Alice?
Alice
Init and Descendents
Kernel “handcrafts”
initial root process to
run “init” program.
Other processes descend from
init, and also run as root,
including user login guards.
Login invokes a setuid
system call to run user
shell in a child process
after user authenticates.
Children of user shell
inherit the user’s
identity (uid).
Protection systems
Alice
Every file and every process is labeled/tagged with a
user ID. A root process may change its user ID.
log in
login
fork, setuid(“alice”),
exec
shell
fork/exec
tool
uid=“alice”
A process inherits its userID
from its parent process.
• Every process (or other entity) has a
label that governs its access rights
on the system.
• The label is a list of named attributes
and optional values.
• Each system defines the space of
attributes and their interpretation.
• Some attributes and values represent
an identity bound to the process.
• E.g.: uid=“alice”
Unix: setuid and login
Alice
• A process with uid==root may change its
userID with the setuid system call.
log in
• This means that a root process can speak
for any user or act as any user, if it tries.
login
fork, setuid(“alice”),
exec
• This mechanism enables system
processes to set up an environment for a
user after login.
shell
fork/exec
tool
uid=“alice”
A privileged login program verifies a user
password and starts a command interpreter
(shell) and/or window manager for a logged-in
user. A user may then interact with a shell to
direct launch of other programs. They run as
children of the shell, with the user’s uid.
Inside the Shell
while (1) {
Char *cmd = getcmd();
int retval = fork();
if (retval == 0) {
// This is the child process
// Setup the child’s process environment here
// E.g., where is standard I/O, how to handle signals?
exec(cmd);
// exec does not return if it succeeds
printf(“ERROR: Could not execute %s\n”, cmd);
exit(1);
} else {
// This is the parent process; Wait for child to
finish
int pid = retval;
wait(pid);
}
}
Reference monitor revisited
requested
operation
“boundary”
protected
state/objects
identity
subject
program
(with attributes)
The reference monitor pattern can work with various
protection mechanisms. Let’s talk about it generally. Later
we’ll see how it applies to servers as well.
Labels and access control
Alice
Every file and every process is
labeled/tagged with a user ID.
log in
login
fork, setuid(“alice”),
A privileged process may
set its user ID.
exec
Bob
login
fork, setuid(“bob”),
shell
exec
shell
fork/exec
fork/exec
creat(“foo”)
tool
write,close
uid=“alice”
A process inherits its userID
from its parent process.
foo
open(“foo”)
read
owner=“alice”
tool
uid=“bob”
A file inherits its owner userID from
its creating process.
Labels and access control
Every system defines rules for
assigning security labels to
subjects (e.g., Bob’s process)
and objects (e.g., file foo).
Alice
login
shell
Every system defines rules to
compare the security labels to
authorize attempted accesses.
Bob
login
shell
creat(“foo”)
tool
uid=“alice”
write,close
foo
open(“foo”)
read
owner=“alice”
Should processes running with
Bob’s userID be permitted to
open file foo?
tool
uid=“bob”
File permissions in Unix (vanilla)
• The owner of a Unix file may tag it with a mode value specifying
access rights for subjects. (Don’t confuse with kernel/user mode.)
– Access types = {read, write, execute} [3 bits]
– Subject types = {owner, group, other/anyone} [3 bits]
– If the file is executed, should the system setuid the process to the
userID of the file’s owner. [1 bit, the setuid bit]
– 10 bits total: (3x3)+1. Usually given in octal: e.g., “777” means 9
bits set: anyone can r/w/x the file, but no setuid.
– Unix “mode bits” are a simple form of an access control list (ACL).
Later systems like AFS have richer ACLs.
• Unix provides a syscall and shell command for owner to set the
permission bits on each file or directory (inode).
– chmod, chown, chgrp
• “Group” was added later and is a little more complicated: a
user may belong to multiple groups.
Another way to setuid
Alice
• A program file itself may be tagged to
setuid to program owner’s uid on exec.
log in
• This is called setuid bit. Some call it the
most important innovation of Unix.
login
fork
setuid(“alice”)
exec
fork
exec(“sudo”)
setuid bit set
/usr/bin/sudo
fork
exec(“tool)
• It enables users to request protected ops
by running secure programs.
shell
sudo
tool
uid=“root”
• The user cannot choose the code: only the
program owner can choose the code to
run with the owner’s uid.
• Parent cannot subvert/control a child after
program launch: a property of Unix exec.
Example: sudo program runs as root, checks
user authorization to act as superuser, and (if
allowed) executes requested command as root.
Protected programs
Alice
log in
login
fork
setuid(“alice”)
exec
shell
“sudo tool”
fork/exec
uid = 0
sudo
xkcd.com
fork
exec(“tool”)
With great power comes great responsibility.
tool
uid=“root”
Example: sudo program runs as root, checks
user authorization to act as superuser, and (if
allowed) executes requested command as root.
The secret of setuid
• The setuid bit is best seen as a mechanism for
programs to function as reference monitors.
• A setuid program can govern access to files.
– Protect the files with the program owner’s uid.
– Make them inaccessible to other users.
– Other users can access them via the setuid program.
– But only in the manner permitted by the program.
– Example: “moo accounting problem” in 1974 paper (cryptic)
• What is the reference monitor’s “isolation boundary”?
What protects it from its callers?
• Note: setuid can also be used to restrict the program’s
privileges by stripping it of the user’s uid.
MOO accounting problem
• Multi-player game called Moo
• Want to maintain high score in a file
• Should players be able to update score?
• Yes
• Do we trust users to write file directly?
• No, they could lie about their score
Game
client
(uid x)
“x’s score = 10”
High
score
“y’s score = 11”
Game
client
(uid y)
[Landon Cox]
MOO accounting problem
• Multi-player game called Moo
Game
client
(uid x)
• Want to maintain high score in a file
• Could have a trusted process update scores
Game
server
• Is this good enough?
“x’s score = 10”
“x:10
y:11”
High
score
“y’s score = 11”
Game
client
(uid y)
[Landon Cox]
MOO accounting problem
• Multi-player game called Moo
Game
client
(uid x)
• Want to maintain high score in a file
• Could have a trusted process update scores
Game
server
• Is this good enough?
• Can’t be sure that reported score is genuine
• Need to ensure score was computed correctly
“x’s score = 100”
“x:100
y:11”
High
score
“y’s score = 11”
Game
client
(uid y)
[Landon Cox]
Access control
• Insight: sometimes simple inheritance of uids is insufficient
• Tasks involving management of “user id” state
• Logging in (login)
• Changing passwords (passwd)
• Why isn’t this code just inside the kernel?
• This functionality doesn’t really require interaction w/ hardware
• Would like to keep kernel as small as possible
• How are “trusted” user-space processes identified?
• Run as super user or root (uid 0)
• Like a software kernel mode
• If a process runs under uid 0, then it has more privileges
[Landon Cox]
Access control
• Why does login need to run as root?
• Needs to check username/password correctness
• Needs to fork/exec process under another uid
• Why does passwd need to run as root?
• Needs to modify password database (file)
• Database is shared by all users
• What makes passwd particularly tricky?
• Easy to allow process to shed privileges (e.g., login)
• passwd requires an escalation of privileges
• How does UNIX handle this?
• Executable files can have their setuid bit set
• If setuid bit is set, process inherits uid of image file’s owner on exec
[Landon Cox]
MOO accounting problem
“fork/exec Gam
Shell
• Multi-player game called Moo
game”
e
(uid
• Want to maintain high score in a file
client
x)
(root)
• How does setuid solve our problem?
• Game executable is owned by trusted entity “x’s score = 10”
• Game cannot be modified by normal users
• Users can run executable though
• High-score is also owned by trusted entity
High
score
• This is a form of trustworthy computing
• Only trusted code can update score
• Root ownership ensures code integrity
• Untrusted users can invoke trusted code
“y’s score = 11”
Gam
Shell
e
(uid “fork/exec
client
y)
game” (root)
Reference monitor revisited
• What is the nature of the isolation boundary?
Protection: abstract concept
• Programs run with an ability to access some set of
data/state/objects.
• We call that set a domain or protection domain.
– Defines the “powers” that the running code has available to it.
– Determined by context and/or identity (label/attributes).
– The domain is not (directly) a property of the program itself.
– Protection domains may overlap.
Some mechanism must
prevent the code from
accessing state outside of
its domain. Examples?
Protection
• Programs may invoke other programs.
– E.g., call a procedure/subprogram/module.
– E.g., launch a program with exec.
• Often/typically/generally the invoked program runs
in the same protection domain as the caller.
• (Or i.e., it has the same privilege over some set of objects. It
may also have some private state. We’re abstracting.)
Examples
?
Protection
• Real systems always use protection.
• They have various means to invoke a program with
more/less/different privilege than the caller.
• In the reference monitor pattern, B has powers
that A does not have, and that A wants. B executes
operations on behalf of A, with safety checks.
Why?
A
B
Be sure
that you
understand
the various
examples
from Unix.
Subverting protection
• If we talk about protection systems, we must also talk
about how they can fail.
• One way is for an attacker to breach the boundary.
• A more common way is for an attacker to somehow
choose the code that executes inside the boundary.
• For example, it can replace the guard, install a
backdoor, or just execute malicious ops directly.
Inside job
Install or control code
inside the boundary.
But how?
Program integrity
Malware
[Source: somewhere on the web.]
[Source: Google Chrome comics.]
Any program you install or run can be a
Trojan Horse vector for a malware payload.
Trusting Programs
• In Unix
– Programs you run use your identity (process UID).
– Maybe you even saved them with setuid so others
who trust you can run them with your UID.
– The programs that run your system run as root.
• You trust these programs.
– They can access your files
– send mail, etc.
– Or take over your system…
• Where did you get them?
Trusting Trust
• Perhaps you wrote them yourself.
– Or at least you looked at the source code…
• You built them with tools you trust.
• But where did you get those tools?
Where did you get those tools?
• Thompson’s observation: compiler hacks
cover tracks of Trojan Horse attacks.
Login backdoor: the Thompson Way
• Step 1: modify login.c
– (code A) if (name == “ken”) login as root
– This is obvious so how do we hide it?
• Step 2: modify C compiler
– (code B) if (compiling login.c) compile A into binary
– Remove code A from login.c, keep backdoor
– This is now obvious in the compiler, how do we hide it?
• Step 3: distribute a buggy C compiler binary
– (code C) if (compiling C compiler) compile code B into binary
– No trace of attack in any (surviving) source code
Reflections on trust
• What is “trust”?
– Trust is a belief that some program or entity will be faithful to its
expected behavior.
• Thompson’s parable shows that trust in programs is based on a
chain of reasoning about trust.
– We often take the chain for granted. And some link may be fragile.
– Corollary: successful attacks can propagate down the chain.
• The trust chain concept also applies to executions.
– We trust whoever launched the program to select it, configure it, and
protect it. But who booted your kernel?
How much damage can malware do?
• If it compromises a process with your uid?
– Read and write your files?
– Overwrite other programs?
– Install backdoor entry points for later attacks?
– Attack your account on other machines?
– Attack your friends?
– Attack other users?
– Subvert the kernel?
What if an attacker compromises a
process running as root?
Process-based protection,
isolation, interprocess
communication, and servers,
and…
Isolation
We need protection domains and protected contexts (“sandboxes”), even on
single-user systems like your smartphone. There are various dimensions of
isolation for protected contexts (e.g., processes):
•
Fault isolation. One app or app instance (context or process) can fail
independently of others. If it fails, the OS can kill the process and reclaim
all of its memory, etc.
•
Performance isolation. The OS manages resources (“metal and glass”:
computing power, memory, disk space, I/O throughput capacity, network
capacity, etc.). Each instance needs the “right amount” of resources to run
properly. The OS prevents apps from impacting the performance of other
apps. E.g., the OS can prevent a program with an endless loop from
monopolizing the processor or “taking over the machine”. (How?)
•
Security. An app may contain malware that tries to corrupt the system,
steal data, or otherwise compromise the integrity of the system. The OS
uses protected contexts and a reference monitor to check and authorize all
accesses to data or objects outside of the context.
Example: Chrome browser
[Google Chrome Comics]
Processes in the browser
Chrome makes an
interesting choice here.
But why use processes?
[Google Chrome Comics]
Problem: heap memory and fragmentation
[Google Chrome Comics]
Solution: whack the whole process
When a process
exits, all of its virtual
memory is reclaimed
as one big slab.
[Google Chrome Comics]
Processes for fault isolation
[Google Chrome Comics]
[Google Chrome Comics]
Concept: enforced modularity
pipe
(or other
channel)
An important theme
By putting each program or component instance in a separate
process, we can enforce modularity boundaries among
components. Each component runs in a sandbox: they can
interact only through pipes or some other channel. Neither can
access the internals of the other.
Communication endpoints and channels
endpoint
port
channel
pipe
binding
connection
data transfer
stream
flow
messages
request/reply
“Remote Procedure Call” (RPC)
events
operations
advertise (bind)
listen
connect (bind)
close
write/send
read/receive
If one side advertises a named
endpoint, we call it a server.
If one side initiates a
channel to a named endpoint,
we call it a client.
Interprocess communication (IPC) and
services
RPC
(API)
GET/PUT
(content)
Clients
initiate connection
and send requests.
Server
listens for and
accepts clients,
handles requests,
sends replies
Concept: RPC
Remote Procedure Call (RPC) is request/response interaction
through a published API, using IPC messaging to cross an interprocess boundary.
API stubs generated from an Interface
Description Language (IDL)
Establishing an RPC connection
to a named remote interface is
often called binding.
RPC is used in many standard Internet services. It is also the basis for
component frameworks like DCOM, CORBA, and Android. Software is
packaged into named “objects” or components. Components may publish
interfaces and/or invoke published interfaces of other components.
Components may execute in different processes and/or on different nodes.
Server as reference monitor
requested
operation
“boundary”
protected
state/objects
subject
program
guard
What is the nature of the isolation boundary?
Clients can interact with the server only by sending
messages through a socket channel. The server chooses
the code that handles received messages.
“Subsystems”
• A server process may provide trusted system functions
to other processes, outside of the kernel.
– E.g., this code is trusted, but like other processes it cannot
manipulate the hardware state except by invoking the kernel.
• Example: Android Activity Manager
subsystem provides many functions of
Android, e.g., component launch and
brokering of component interactions.
• With no special kernel support! It uses
same syscalls as anyone else.
• AMS controls app contexts by forking
them with a trusted lib, and issuing RPC
commands to that lib.
Android
AMS
subsystem
JVM+lib
Linux
kernel
“binder” messaging driver in kernel
Binding to a service (Android)
In Android, binding is
a protected operation.
public abstract boolean bindService
(Intent service, ServiceConnection conn, int flags)
Connect to an application service, creating it if needed. …The given conn
will receive the service object when it is created and be told if it dies and
restarts. …
This function will throw SecurityException if you do not have permission to
bind to the given service.
Subverting services
• There are lots of security issues here.
• TBD Q: Is networking secure? How can the client and server
authenticate over a network? How can they know the messages
aren’t tampered? How to keep them private? A: crypto.
• TBD Q: Can an attacker inject malware scripting into my browser?
What are the isolation defenses?
• Q for now: Can an attacker penetrate the server, e.g., to choose the
code that runs in the server?
Inside job
Install or control code
inside the boundary.
But how?
“confused deputy”
http://blogs.msdn.com/b/sdl/archive/2008/10/22/ms08-067.aspx
Code
void cap (char* b){
for (int i=0;
b[i]!=‘\0’;
i++)
0x8048361 } b[i]+=32;
int main(char*arg) {
char wrd[4];
strcpy(arg, wrd);
cap (wrd);
return 0;
0x804838c }
What can go wrong?
Can overflow wrd variable …
Overwrite cap’s RA
Memory
Stack
0xfffffff
…
0x0
cap
b= 0x00234
RA=0x804838c
wrd[3]
wrd[2]
wrd[1]
main
wrd[0]
0x00234
const2=0
The Point
• You should understand the basics of a
“stack smash” or “buffer overflow” attack as
a basis for pathogens.
• These have caused a lot of pain and damage
in the real world.
Networked services: big picture
client host
NIC
device
client
applications
kernel
network
software
Internet
“cloud”
Data is sent on the
network as messages
called packets.
server hosts
with server
applications
Sockets
socket
A socket is a buffered
channel for passing
data over a network.
client
int sd = socket(<internet stream>);
gethostbyname(“www.cs.duke.edu”);
<make a sockaddr_in struct>
<install host IP address and port>
connect(sd, <sockaddr_in>);
write(sd, “abcdefg”, 7);
read(sd, ….);
• The socket() system call creates a socket object.
• Other socket syscalls establish a connection (e.g., connect).
• A file descriptor for a connected socket is bidirectional.
• Bytes placed in the socket with write are returned by read in order.
• The read syscall blocks if the socket is empty.
• The write syscall blocks if the socket is full.
• Both read and write fail if there is no valid connection.
Unix “file descriptors” illustrated
user space
kernel space
file
int fd
pointer
per-process
descriptor
table
pipe
socket
Disclaimer:
this drawing is
oversimplified
tty
“open file table”
There’s no magic here: processes use read/write (and other syscalls) to
operate on sockets, just like any Unix I/O object (“file”). A socket can
even be mapped onto stdin or stdout.
Deeper in the kernel, sockets are handled differently from files, pipes, etc.
Sockets are the entry/exit point for the network protocol stack.
The network stack, simplified
Internet client host
Internet server host
Client
User code
Server
TCP/IP
Kernel code
TCP/IP
Sockets interface
(system calls)
Hardware interface
(interrupts)
Network
adapter
Hardware
and firmware
Network
adapter
Global IP Internet
Note: the “protocol stack” should not be confused with a thread stack. It’s
a layering of software modules that implement network protocols:
standard formats and rules for communicating with peers over a network.
Ports and packet demultiplexing
Data is sent on the network in messages called packets
addressed to a destination node and port. Kernel network stack
demultiplexes incoming network traffic: choose process/socket to
receive it based on destination port.
Incoming network packets
Network adapter hardware
aka, network interface
controller (“NIC”)
Apps with
open
sockets
TCP/IP Ports
• Each transport endpoint on a host has a logical port
number (16-bit integer) that is unique on that host.
• This port abstraction is an Internet Protocol concept.
– Source/dest port is named in every IP packet.
– Kernel looks at port to demultiplex incoming traffic.
• What port number to connect to?
– We have to agree on well-known ports for common services
– Look at /etc/services
– Ports 1023 and below are ‘reserved’.
• Clients need a return port, but it can be an ephemeral
port assigned dynamically by the kernel.
A peek under the hood
chase$ netstat -s
tcp:
11565109 packets sent
1061070 data packets (475475229 bytes)
4927 data packets (3286707 bytes) retransmitted
7756716 ack-only packets (10662 delayed)
2414038 window update packets
29213323 packets received
1178411 acks (for 474696933 bytes)
77051 duplicate acks
27810885 packets (97093964 bytes) received in-sequence
12198 completely duplicate packets (7110086 bytes)
225 old duplicate packets
24 packets with some dup. data (2126 bytes duped)
589114 out-of-order packets (836905790 bytes)
73 discarded for bad checksums
169516 connection requests
21 connection accepts
inetd
• Classic Unix systems run an
inetd “internet daemon”.
• Inetd receives requests for
standard services.
– Standard services and ports
listed in /etc/services.
– inetd listens on all the ports
and accepts connections.
• For each connection, inetd
forks a child process.
• Child execs the service
configured for the port.
• Child executes the request,
then exits.
[Apache Modeling Project: http://www.fmc-modeling.org/projects/apache]
Children of init:
inetd
New child processes are
created to run network
services.
They may be created on
demand on connect
attempts from the
network for designated
service ports.
Should they run as root?
Multi-process server architecture
• Each of P processes can execute one request at a time,
concurrently with other processes.
• If a process blocks, the other processes may still make
progress on other requests.
• Max # requests in service concurrently == P
• The processes may loop and handle multiple requests
serially, or can fork a process per request.
– Tradeoffs?
• Examples:
– inetd “internet daemon” for standard /etc/services
– Design pattern for (Web) servers: “prefork” a fixed number of
worker processes.
Download