Language-Based Security - Shambhu Upadhyaya November 15, 2006

advertisement
Language-Based Security
- Guest lecture in CSE 605: Advanced Programming Languages
November 15, 2006
Shambhu Upadhyaya
CSE Department
University at Buffalo
Motivation

Conventional security mechanisms treat
programs as black box




Encryption
Firewalls
System calls/privileged mode, Access control
But it cannot address important current and
future security needs:


Downloaded, mobile code
Buffer overruns and other safety problems
Threat Model
Remote Exploit



Attacker can send in-band data
Attacker can force execution to
jump to the data
Language-based security is one way to deal with it
Running
Process
Outline





The need for language-based security
Safety analyses and transformations
Making languages safe
Certifying compilation and verification
Runtime Environment Driven Program Safety – a
case study
Acknowledgements:
Fred Schneider, Greg Morrisett Tutorial (Cornell)
Ramkumar Chinchani (Cisco)
Computer security

Goal: prevent bad things from happening






Clients not paying for services
Critical service unavailable
Confidential information leaked
Important information damaged
System used to violate copyright
Conventional security mechanisms aren’t up
to the challenge
st
1
Principle of Security Design
Least Privilege: each principal is given the
minimum access needed to accomplish
its task [Saltzer & Schroeder ‘75]
Examples:
+ Administrators don’t run day-to-day tasks as root, So
“rm –rf /” won’t wipe the disk
-

fingerd runs as root so it can access different users’
.plan files. But then it can also
“rm –rf /”
Problem: OS privilege is coarse-grained
nd
2
Principle of Security Design
Keep the TCB small.
Trusted Computing Base (TCB) : components
whose failure compromises the security of a
system
 Example: TCB of operating system includes
kernel, memory protection system, disk image
 Small/simple TCB:
 TCB correctness can be checked/tested/reasoned
about more easily  more likely to work

Large/complex TCB:
 TCB contains bugs enabling security violations
Attack Implementation



Stack-based Overflows
Heap-based Overflows
Format String Attack
Example Attack: buffer overflows
Payload
char buf[100];
…
gets(buf);
Return address
buf


Program
Stack
sp
Attacker gives long input that overwrites
function return address, local variables
“Return” from function transfers control to
payload code
Example Attack: format strings
fgets(sock, s, n);
…
fprintf(output, s);
Attack: pass string s containing a %n qualifier
(writes length of formatted input to arbitrary
location)
 Use to overwrite return address to “return” to
malicious payload code in s


NB: neither attack is viable with a type-safe
language such as Java
But who’s going to rewrite 50Mloc in Java/C#?
1988: Morris Worm
Penetrated an estimated 6,000 machines (510% of hosts at that time)
Used a number of clever methods to gain
access to a host




brute force password guessing
bug in default sendmail configuration
X windows vulnerabilities, rlogin, etc.
buffer overrun in fingerd
Remarks:


Buffer overruns account for roughly half of CERT
advisories
You’d think we could solve this but…
1999: Love Bug & Melissa
Both email-based viruses that exploited:



a common mail client (MS Outlook)
trusting (i.e., uneducated) users
VB scripting extensions within messages to:



lookup addresses in the contacts database
send a copy of the message to those contacts
ran with full privileges of the user
Melissa: hit an estimated 1.2 million machines
Love Bug: caused estimated $10B in damage
When to enforce security
Possible times to respond to security violations:
 Before execution:
analyze, reject, rewrite

During execution:
monitor, log, halt, change

After execution:
rollback, restore, audit, sue, call police
Language-based techniques
A complementary tool in the arsenal: programs
don’t have to be black boxes! Options:
1.
2.
3.
Analyze programs at compile time or load
time to ensure that they are secure
Check analyses at load time to reduce TCB
Transform programs at compile/load/run
time so that they don’t violate security, or so
that they log actions for later auditing
Fixing The Problem
Compiler
Source Program



Static analysis
Model-checking
Type-safety
Binary Executable


Runtime checks
Anomaly detection
Maturity of language tools
Some things have been learned in the last 25 years…
 How to build a sound, expressive type system that
provably enforces run-time type safety
 protected interfaces
 Type systems that are expressive enough to encode
multiple high-level languages
 language independence
 How to build fast garbage collectors
 trustworthy pointers
 On-the-fly code generation and optimization
 high performance
CQUAL
Pros
Cyclone
One-time effort
 Efficient

BOON
CCured
Cons
Weak type system
 Arbitrary
typecasting

Type-Based
Pros

One-time effort
Cons
Static Analysis
Undecidability of
aliasing
 False negatives

Runtime Checks
Pros
Coverage
 Few false positives

BoundsChecker
Cons

StackGuard
Inefficient
Security properties
What kinds of properties do we want to ensure
programs or computing systems satisfy?
Safety properties




“Nothing bad ever happens”
A property that can be enforced using only
history of program
Amenable to purely run-time enforcement
Examples:



access control (e.g., checking file permissions on
file open)
memory safety (process does not read/write outside
its own memory space)
type safety (data accessed in accordance with type)
Information security: confidentiality


Confidentiality: valuable information should not be
leaked by computation
Also known as secrecy, though sometimes a distinction
is made:



Simple (access control) version:



Secrecy: information itself is not leaked
Confidentiality: nothing can be learned about information
Only authorized processes can read from a file
But… when should a process be “authorized” ?
End-to-end version:


Information should not be improperly released by a
computation no matter how it is used
Requires tracking information flow in system - Encryption
provides end-to-end secrecy—but prevents computation
Information security: integrity


Integrity: valuable information should not be
damaged by computation
Simple (access control) version:



Only authorized processes can write to a file
But… when should a process be “authorized”
End-to-end version:


Information should not be updated on the basis of
less trustworthy information
Requires tracking information flow in system
Privacy and Anonymity

Privacy: somewhat vague term
encompassing confidentiality, secrecy,
anonymity


Sometimes means: individuals (principals) and
their actions cannot be linked by an observer
Anonymity: identity of participating principals
cannot be determined even if actions are
known

stronger than privacy
Availability




System is responsible to requests
DoS attacks: attempts to destroy availability
(perhaps by cutting off network access)
Fault tolerance: system can recover from
faults (failures), remain available, reliable
Benign faults: not directed by an adversary


Usual province of fault tolerance work
Malicious or Byzantine faults: adversary can
choose time and nature of fault


Byzantine faults are attempted security violations
usually limited by not knowing some secret keys
Safety analysis and
transformation
Reference Monitor
Observes the execution of a program and halts
the program if it’s going to violate the security
policy
Common Examples:




memory protection
access control checks
routers
firewalls
Most current enforcement mechanisms are
reference monitors
What policies?

Reference monitors can only see the past

They can enforce all safety properties but not
liveness properties
Assumptions:



monitor can have access to entire state of
computation
monitor can have arbitrarily large state
but monitor can’t guess the future – the predicate
it uses to determine whether to halt a program
must be computable
Interpreter and sandboxing
Interpreter:



easy to implement (small TCB)
works with binaries (high-level language-independent)
terrible execution overhead (25x? 70x?)
Sandboxing:




Software Fault Isolation (SFI) code rewriting is
“sandboxing”
Requires code and data for a security domain are in one
contiguous segment
Java sandbox model
Restricts the functionality of the code
Type Safety and Security
Type-safe languages
Software-engineering benefits of type safety:
 memory safety
 no buffer overruns (array subscript a[i] only
defined when i is in range for the array a)


no worries about self-modifying code, wild
jumps, etc.
Type safety can be used to construct a
protected interface (e.g., system call
interface) that applies access rules to
requests
Java




Java is a type-safe language in which type
safety is security-critical
Memory safety: programs cannot fabricate
pointers to memory
Type safety: private fields, methods of objects
cannot be accessed without using object
operations
Bytecode verifier ensures compiled bytecode
is type-safe
Security operations

Each method has an associated protection domain


doPrivileged(P){S}:



e.g., applet or local
fails if method's domain does not have priv. P
switches from the caller's domain to the method's while
executing statement S (think setuid)
checkPermission(P) walks up stack S doing:
for (f := pop(S); !empty(S) ; f := pop(S)) {
if domain(f) does not have priv. P then
error;
if f is a doPrivileged frame then break;
}

Ensures integrity of control flow leading to a
security-critical operation
Some pros and cons?

Pros:




rich, dynamic notion of context that tracks
some of the history of the computation
this could stop Melissa, Love Bug, etc.
low overhead
Cons:

implementation-driven (walking up stacks)



Could be checked statically [Wallach]
policy is smeared over program
possible to code around the limited history

e.g., by having applets return objects that are
invoked after the applet's frames are popped
Require type safety?


Write all security-critical programs in type-safe highlevel language? (e.g., Java)
Problem 1: legacy code written in C, C++


Problem 2: sometimes need control over memory
management


Solution: type-safe, backwards compatible C
Solution: type-safe memory management
Can we have compatibility, type safety and low-level
control? Can get 2 out of 3:

CCured [Necula et al. 2002]


Emphasis on compatibility, memory safety
Cyclone [Jim et al. 2002]

Emphasis on low-level control, type safety
CCured [Necula, 2002]


Another type-safe C dialect
Different pointer classes



DYNAMIC : no info, slow, all accesses checked
SAFE: a memory- and type-safe pointer (or null)
SEQ: pointer to an array of data (like Cyclone fat)
Type-safe world
SAFE,SEQ

Memory-safe world
DYNAMIC
DYNAMIC
Nonmodular but fast C  CCured converter
based on BANE constraint solving framework
Certifying compilation
Code certification mechanisms


Problem: can you trust the code you run?
Code signing using digital signatures



Idea: self-certifying code




Too many signers
If you can’t trust Microsoft,…
Code consumer can check the code itself to ensure
it’s safe
Code includes annotations to make this feasible
Checking annotations easier than producing them
Certifying compiler generates self-certifying code

Java/JVM: first real demonstration of idea
PCC:
trusted computing base
verifier
optimizer
System
Binary
machine code
prover
Security
Policy
could be
you
“certified binary”
code
invariants proof
Making “Proof ” Rigorous:
Specify machine-code semantics and security
policy using axiomatic semantics
{Pre} ld r2,r1(i) {Post}
Given:


security policy (i.e., axiomatic semantics and
associated logic for assertions)
untrusted code, annotated with (loop) invariants
it’s possible to calculate a verification condition:


an assertion A such that
if A is true then the code respects the policy
Producer side
The code producer takes its code & the policy:



constructs some loop invariants
constructs the verification condition A from the
code, policy, and loop invariants
constructs a proof that A is true
code
invariants
“certified binary”
proof
Proof-carrying code
Code producer Safety
policy
Target
code
Safety
Proof
Code consumer
Verification
condition
generator
Safety
Conditions
Proof
checker
Code consumer side
Verifier (~5 pages of C code):



takes code, loop invariants, and policy
calculates the verification condition A
checks that the proof is a valid proof of A:


fails if some step doesn’t follow from an axiom or
inference rule
fails if the proof is valid, but not a proof of A
code
ininvariants
variants
“certified binary”
proof
Advantages of PCC
A generic architecture for providing and
checking safety properties
In Principle:





Simple, small, and fast TCB
No external authentication or cryptography
No additional run-time checks
“Tamper-proof”
Precise and expressive specification of code safety
policies
In Practice:

Still hard to generate proofs for properties stronger
than type safety. Need certifying compiler…
Summary


Extensible, networked systems need languagebased security mechanisms
Analyze and transform programs before running


Use safe programming languages to defend
against attacks and for fine-grained access
control


Java, C#, CCured, Cyclone
Certifying compilation to simplify TCB


Software fault isolation, inlined reference monitors
JVM, TAL, PCC
Static analysis of end-to-end system security

Noninterference, static information flow, …
Other work and future challenges







Lightweight, effective static analyses for common
attacks (buffer overflows, format strings, etc.)
Security types for secrecy in network protocols
Self-certifying low-level code for object-oriented
languages
Applying interesting policies to PCC/IRM
Secure information flow in concurrent systems
Enforcing availability policies
See website for bibliography, more tutorial
information:
www.cs.cornell.edu/info/People/jgm/pldi03
Runtime Environment Driven
Program Safety – a case study
Making the case



Static analysis is one-time but poor coverage
Runtime checks have good coverage but per
variable checks are inefficient
Type-based safety is efficient but can be
coarse-grained
Motivation

A new vulnerability class:
Integer Overflow Vulnerability


Recently seen in openssh, pine, Sun RPC
and several other software
Cause: attacker-controlled integer variable
Integer Overflow Attack
alloc_mem(u_short size)
{
u_short pad_size = 16;
size = size + pad_size;
return malloc(size);
}
size = 65535
size = 15 !!
return smaller memory
Program Security Is NOT Portable!
32-bit
Safe
Program
Security
Safe
Source or
Binary code
Unsafe
16-bit
Buffer Overflows And Runtime Environment



Intel
Immediate caller’s
return address on stack
For successful attack,
callee has to return
Variable width
instruction set



PowerPC
Immediate caller’s
return address in link
register
For successful attack,
both callee and caller
have to return
Fixed width instruction
set
NOTE: StackGuard has only been implemented for the x86 architecture.
Porting to other architectures should be easy, but we haven't done it yet
Various Runtime Environments
Observations

Vulnerabilities are specific to a runtime environment

CERT incident reports contain information such as
architecture, software distribution, version, etc.

Programming language level analysis is not adequate

Machine word size determines behavior of numerical
types

Operating system and compiler determine memory
layout
Approach Outline
Runtime Environment Driven Program Safety



Infer safety properties in context of runtime
environment
Enforce these properties
Java-like; except no JVM
Overall Goal
RE 1
Safe
Program
Security
Source or
Binary code
RE 2
Safe
Safe
RE 3
Basic Methodology

A Type-Based Safety Approach




Runtime-dependent interpretation
Not merely an abstraction, but using actual values
No new types
Also, can be efficient
Prototype Implementation: ARCHERR


Implemented as a parser using flex and bison
Currently works on 32-bit Intel/Linux platform
Detecting Integer Overflows

Machine word size is an important factor
Intel XScale Processor
16-bit
(now 32-bit version)
32-bit

Intel Pentium Processor
Main idea: Analyze assignment and
arithmetic operations in context of machine
word size
Integers : Classical View
Assignment:
x : int → x є I
x, y : int
x=y
Arithmetic:
succ(x : int) = (x + 1)
pred(x : int) = (x – 1)
where I = (-∞, +∞)
Integers: Runtime Dependent View
Integer Arithmetic Safety Checks
if x ≥ 0; y ≥ 0, then
x + y  assert : x  (MAXINT - y)
if x ≥ 0; y < 0, then
x - y  assert : x  (MAXINT + y)
if x < 0; y ≥ 0, then
x - y  assert : x ≥ (MININT + y)
if x < 0; y < 0, then
x + y  assert : x ≥ (MININT - y)
 x, y,
x  y  assert : x ≥ MININT/y /\ x  MAXINT/y
x  y  assert : y ≠ 0
x % y  assert : y ≠ 0
Other Numerical Types

short, long, unsigned short/long, etc.


Similar analysis
float, double, long double



Floating points use a standard IEEE format
Analysis is more complex
But floating point arithmetic is discouraged for
efficiency reasons
Other Operators

Bitwise operators



<< : multiplication by 2
>> : division by 2 (is safe)
Logical operators?

Not exactly arithmetic in nature
In A Program?
foo(int x, int y)
{
VALIDATE_ADD_INT(x,y);
16-bit check?
32-bit check?
return (x + y);
}
Compile-time
Annotations
Runtime
Checking
A High-Level View

What have we achieved
actually?
Properties of
types in classical
sense
A programmer’s
view
Automatic
safety
conversion
An attacker’s view
RE 1
RE 2
Extending Idea To Pointers


Common concept of segments – data, text, stack
But differences in actual layout
Process Address Map
4 GB
System space
(0xFFFFFFFF)
3 GB
(0xBFFFFFFF)
User space
0 GB
Linux
Windows NT
Similarities/Differences Between
Integers and Pointers

Pointers are represented as “unsigned int’s”



Both pointers and integers have a type
However, arithmetic is different



A machine word to represent the entire virtual
address space
integers increment by 1
pointers increment by size of type they point to
Valid values that they can assume?


For integers, just one interval
For pointers, several disjoint intervals
Bookkeeping Examples
char buffer[256];
char *p;
int main() {
ADD_GLOBAL(buffer,
sizeof(buffer));
ADD_GLOBAL(&p, sizeof(p));
…
}
int foo() {
char buffer[256];
ADD_STACK_FRAME();
…
DEL_STACK_FRAME();
return 0;
}
Reading stack frames currently
implemented through ESP and
EBP register values.
Pointers : Runtime Dependent View

Safe pointer assignment


A pointer variable p, which points to variables of
type  be denoted by p:q()
Safe pointer arithmetic (the following must
obey the above rule)
Pointer Assignment Scenarios
Pointer Check Examples
VALIDATE_PTR(q);
p = q;
q is a valid ptr?
[q, sizeof(*q)] is inside same
range?
VALIDATE_PTR(&p[i]);
p[i] = 2;
&p[i] is a valid ptr?
[&p[i], sizeof (*(&p[i]))] is inside
same range?
VALIDATE_PTR_ADD(p, 1);
p++;
p is a valid ptr?
[p, sizeof(*p)] is inside same
range?
p + 1 is a valid ptr and belongs to
the same address range?
Additional Pointer Issues

Function pointers



If not protected, can lead to arbitrary code
execution
Maintain a separate list of function addresses and
check against them
Typecasting is a feature in C


Private fields in structures through void *
Leads to dynamic types
Optimizations

Remove unnecessary checks using static
analysis


Speed up memory range lookups



Currently, integer arithmetic
Maintain separate FIFO lists for stack, data and
heap
Pointer assignment is "safe"; dereferencing
is not
Optimize initialization loops
Security Testing



Does this approach actually work?
Real-world programs
Vulnerabilities and exploits available at SecurityFocus website
Program
Vulnerability
Detected?
sendmail (8.11.6)
Stack-based buffer
overflow
YES
GNU indent (2.2.9)
Heap-based buffer
overflow
YES
man (1.5.1)
Format string
NO
pine (4.56)
Integer overflow
YES
Performance Testing



Scimark2 benchmark
32-bit Intel/Linux 2.6.5
Compared against CCured and BoundsChecker
Performance Hit (slowdown)
CCured
BoundsChecker
1.5 x
35 x
ARCHERR w/o pointer checks
ARCHERR with pointer checks
2.3 x
2.5 x
Impact On Code Size

Source code annotations cause bloat
Source Code Bloat
Runtime Image Bloat
1.5 – 2.5 x
1.2 – 1.4 x
Limitations

Source code annotations


Therefore, no guarantees if code is not annotated
Write wrappers for well-documented binary
libraries
32

strncpy(dst, src, 32);
Maybe considered inefficient for high
performance applications

Security has a price but it is not too high
Features


Portable safety is runtime environment
dependent
First work to show systematic way to
detect/prevent integer overflow attacks


Extended the idea to detect/prevent memorybased attacks


Currently on one architecture
Again on one architecture
Security testing and performance evaluation
CQUAL
Pros
Cyclone
One-time effort
 Efficient

BOON
CCured
Cons
Weak type system
 Arbitrary
typecasting

Type-Based
Pros

One-time effort
Cons
Static Analysis
ARCHERR
Undecidability of
aliasing
 False negatives

Runtime Checks
Pros
Coverage
 Few false positives

BoundsChecker
Cons

StackGuard
Inefficient
Current Status And Future Work

Code to be released soon


Investigating implementation on other runtime
environments




Currently research grade
32-bit Intel/Windows PE32
32-bit Intel/FreeBSD ELF
32-bit SPARC/ELF
Improve efficiency?


rndARCHERR – randomized runtime checks
Static analysis driven optimizations
Reference

ARCHERR: Runtime Environment Driven
Program Safety



Ramkumar Chinchani, Anusha Iyer,
Bharat Jayaraman, and Shambhu Upadhyaya
ESORICS 2004
http://www.cse.buffalo.edu/~rc27/publications/chin
chani-ESORICS04-final.pdf
Download