The Bugs and the Bees Research in Programming Languages and Security David Evans

advertisement
The Bugs and the Bees
Research in Programming
Languages and Security
David Evans
evans@cs.virginia.edu
http://www.cs.virginia.edu/~evans
University of Virginia
Department of Computer Science
Computer Science
• “How to” knowledge:
– Ways of describing imperative processes
(computations)
– Ways of reasoning about (predicting) what
imperative processes will do
• Most interesting CS problems concern:
– Better ways of describing computations
– Ways of reasoning about what they do
(and don’t do)
23 Sept 2002
David Evans - CS696
2
My Research Projects
The Bugs – Splint
How can we detect code that
describes unintended
computations?
The Bees - “Programming the Swarm”
How can we program massively
distributed collections of simple
devices and reason about their
behavior in hostile environments?
23 Sept 2002
David Evans - CS696
3
A Gross Oversimplification
all
Bugs Detected
Formal Verifiers
Compilers
none
Low
23 Sept 2002
Effort Required
David Evans - CS696
Unfathomable
4
(Almost) Everyone Likes Types
• Easy to Understand
• Easy to Use
• Quickly Detect Many Programming
Errors
• Useful Documentation
• …even though they are lots of work!
– 1/4 of text of typical C program is for types
23 Sept 2002
David Evans - CS696
5
Limitations of
Standard Types
Type of reference
never changes
Language defines
checking rules
One type per
reference
23 Sept 2002
State changes along
program paths
System or
programmer defines
checking rules
Many attributes per
reference
David Evans - CS696
6
Limitations of
Standard Types
Type of reference
never changes
Language defines
checking rules
One type per
reference
23 Sept 2002
Attributes
State changes along
program paths
System or
programmer defines
checking rules
Many attributes per
reference
David Evans - CS696
7
Approach
• Programmers add annotations (formal
specifications)
– Simple and precise
– Describe programmers intent:
• Types, memory management, data hiding,
aliasing, modification, null-ity, buffer sizes,
security, etc.
• Splint detects inconsistencies between
annotations and code
– Simple (fast!) dataflow analyses
23 Sept 2002
David Evans - CS696
8
Security Flaws
Other
16%
M alformed
Input
16%
Buffer
Overflows
19%
Format
Bugs
6%
Resource
Leaks
6%
Pathnames
10%
Access
16%
Symbolic
Links
11%
190 Vulnerabilities
Only 4 having to do with crypto
108 of them could have been
Reported flaws in Common Vulnerabilities and
detected with simple
Exposures Database, Jan-Sep 2001.
[Evans & Larochelle, IEEE Software, Jan 2002.]
static analyses!
23 Sept 2002
David Evans - CS696
9
Example: Buffer Overflows
David Larochelle
• Most commonly exploited security
vulnerability
– 1988 Internet Worm
– Still the most common attack
• Code Red exploited buffer overflow in IIS
• >50% of CERT advisories, 23% of CVE entries in 2001
• Attributes describe sizes of allocated buffers
• Heuristics for analyzing loops
• Found several known and unknown buffer
overflow vulnerabilities in wu-ftpd
23 Sept 2002
David Evans - CS696
10
Some Open Issues
• Differential Program Analysis [Joel Winstead]
– We usually don’t just have one program, we have lots
of versions of similar programs
– How can we discover interesting differences between
two versions of a program?
• e.g., find a test case that reveals the difference, find invariants
that are different
• Design-level Properties
– Can we develop annotations and checks that deal with
design-level properties?
• Integrate run-time checking
– Combine static and run-time checking to enable
additional checking and completeness guarantees
23 Sept 2002
David Evans - CS696
11
Splint
• More information: splint.org
IEEE Software ’02, USENIX Security ’01, PLDI ’96
• Public release – real users, mentioned in C FAQ, C
Unleashed, Linux Journal, etc.
• Students (includes other PL/SE/security related
projects):
– David Larochelle: buffer overflows, automatic annotations
– Joel Winstead: differential program analysis
– Greg Yukl: source code generation
• Current Funding: NASA (joint with John Knight)
23 Sept 2002
David Evans - CS696
12
Programming the Swarm
23 Sept 2002
David Evans - CS696
13
Really Brief History of
Computer Science
1950s: Programming in the small...
Programmable computers
Learned the programming is hard
Birth of higher-order languages
Tools for reasoning about trivial programs
1970s: Programming in the large...
Abstraction, objects
Methodologies for development
Tools for reasoning about
component-based systems
2000s: Programming the Swarm!
23 Sept 2002
David Evans - CS696
14
What’s Changing
• Execution Platforms
– Small, cheap and unreliable
– Limited power – communication is expensive
• Execution environment
– Interact with physical world
– Unpredictable, dynamic
• Programs
– Old style of programming won’t work
– Is there a new paradigm?
23 Sept 2002
David Evans - CS696
15
Programming the Swarm:
Long-Range Goal
Cement
10 GFlop
23 Sept 2002
David Evans - CS696
16
Why this Might be Possible?
• We are surrounded by systems that:
– Contain 50 Trillion (5 * 1013) components
– Continue to function when 50 million
components fail every second
– Survive in hostile environments (even
Canada!)
– Self-organize starting from a single
component and a program that is smaller
than WindowsXP
23 Sept 2002
David Evans - CS696
17
A Biological Programming Model
Selvin George
• Program systems the way biology does
• Literal interpretation:
– Cells can change state (genes turn on and
off)
– Cells can divide
• Asymmetrically
– Cells can communicate over short distances
• Chemical diffusion
23 Sept 2002
David Evans - CS696
18
Example
Cell
Program
state s1 {
transitions
-> (s1, s1)
normal;
}
23 Sept 2002
David Evans - CS696
19
Cell Programs
• Use chemicals to control development
• How can we produce cell programs that
generate particular structures?
• How can we reason about the behavior
of cell programs in the presence of
failures and randomness?
• How can we describe cell programs at a
higher level? (Making abstractions)
23 Sept 2002
David Evans - CS696
20
Less Literal Interpretation
• Learn about self-organization and
robustness by mimicking biology
– Learn principles from biology, not programs
• Use this to build real systems
– Sensor networks
– Distributed file sharing
23 Sept 2002
David Evans - CS696
21
Sensor Networks
High-power base station
Thousands of small, low-powered devices with
sensors and actuators, communicating wirelessly
23 Sept 2002
David Evans - CS696
22
Sensor Networks
High-power base station
Compromised Node!
Enemy base station
23 Sept 2002
David Evans - CS696
23
Security for Sensor Networks
• Control Messages
– Only messages from base station (or other
nodes) should change device behavior
• Data Collection
– A few compromised nodes should not be able
to prevent or tamper with data collection
• Data Confidentially
– Some applications: eavesdropper shouldn’t
be able to interpret messages
23 Sept 2002
David Evans - CS696
24
Why security for sensor
networks is hard
• Low power devices
– Cannot do traditional public-key algorithms
• Limited device communication
– Sending messages is extremely expensive
• Communication is wireless
– All messages are vulnerable to
eavesdropping and forgery
• Devices start identical – no stored secrets
23 Sept 2002
David Evans - CS696
25
Asymmetric Cryptography
• Cryptography depends either on:
– Shared secrets
– Asymmetry (normally or information)
• Exploit time and space asymmetries
– Public-key systems get asymmetry by only
one party knowing private key
– In sensor networks, we can get asymmetry
by using time (key is revealed later, but in a
verifiable way) and space (only nodes within
a certain distance can hear)
23 Sept 2002
David Evans - CS696
26
Non-Cryptographic
Techniques
• Redundancy
– Lots of sensors, only a few will be
compromised or bogus
• Snooping
– Because communication is wireless, nodes
can hear what their neighbors are saying
– If they are lying, tattle tale!
23 Sept 2002
David Evans - CS696
27
Programming the Swarm
swarm.cs.virginia.edu
• Students:
– Selvin George: Biological Programming Model
– Undergraduates: Keen Browne, Jacques Fournier,
Chris Frost, Ami Malaviya, Jon McCune
• Funding: NSF Career Award, NSF ITR
23 Sept 2002
David Evans - CS696
28
Summary
• Programming the Swarm: Describing and
reasoning about behavior of large ad hoc
collections in hostile environments
• Splint: Detecting differences between what
programs express and what programmers intend
• Be proactive about finding an advisor
– Most important decision you will make in grad school
– Matching process is last resort
• Email to arrange meetings: evans@cs.virginia.edu
23 Sept 2002
David Evans - CS696
29
Download