The Bugs and the Bees Research in Programming Languages and Security David Evans evans@cs.virginia.edu http://www.cs.virginia.edu/~evans University of Virginia Department of Computer Science Computer Science • “How to” knowledge: – Ways of describing imperative processes (computations) – Ways of reasoning about (predicting) what imperative processes will do • Most interesting CS problems concern: – Better ways of describing computations – Ways of reasoning about what they do (and don’t do) 23 Sept 2002 David Evans - CS696 2 My Research Projects The Bugs – Splint How can we detect code that describes unintended computations? The Bees - “Programming the Swarm” How can we program massively distributed collections of simple devices and reason about their behavior in hostile environments? 23 Sept 2002 David Evans - CS696 3 A Gross Oversimplification all Bugs Detected Formal Verifiers Compilers none Low 23 Sept 2002 Effort Required David Evans - CS696 Unfathomable 4 (Almost) Everyone Likes Types • Easy to Understand • Easy to Use • Quickly Detect Many Programming Errors • Useful Documentation • …even though they are lots of work! – 1/4 of text of typical C program is for types 23 Sept 2002 David Evans - CS696 5 Limitations of Standard Types Type of reference never changes Language defines checking rules One type per reference 23 Sept 2002 State changes along program paths System or programmer defines checking rules Many attributes per reference David Evans - CS696 6 Limitations of Standard Types Type of reference never changes Language defines checking rules One type per reference 23 Sept 2002 Attributes State changes along program paths System or programmer defines checking rules Many attributes per reference David Evans - CS696 7 Approach • Programmers add annotations (formal specifications) – Simple and precise – Describe programmers intent: • Types, memory management, data hiding, aliasing, modification, null-ity, buffer sizes, security, etc. • Splint detects inconsistencies between annotations and code – Simple (fast!) dataflow analyses 23 Sept 2002 David Evans - CS696 8 Security Flaws Other 16% M alformed Input 16% Buffer Overflows 19% Format Bugs 6% Resource Leaks 6% Pathnames 10% Access 16% Symbolic Links 11% 190 Vulnerabilities Only 4 having to do with crypto 108 of them could have been Reported flaws in Common Vulnerabilities and detected with simple Exposures Database, Jan-Sep 2001. [Evans & Larochelle, IEEE Software, Jan 2002.] static analyses! 23 Sept 2002 David Evans - CS696 9 Example: Buffer Overflows David Larochelle • Most commonly exploited security vulnerability – 1988 Internet Worm – Still the most common attack • Code Red exploited buffer overflow in IIS • >50% of CERT advisories, 23% of CVE entries in 2001 • Attributes describe sizes of allocated buffers • Heuristics for analyzing loops • Found several known and unknown buffer overflow vulnerabilities in wu-ftpd 23 Sept 2002 David Evans - CS696 10 Some Open Issues • Differential Program Analysis [Joel Winstead] – We usually don’t just have one program, we have lots of versions of similar programs – How can we discover interesting differences between two versions of a program? • e.g., find a test case that reveals the difference, find invariants that are different • Design-level Properties – Can we develop annotations and checks that deal with design-level properties? • Integrate run-time checking – Combine static and run-time checking to enable additional checking and completeness guarantees 23 Sept 2002 David Evans - CS696 11 Splint • More information: splint.org IEEE Software ’02, USENIX Security ’01, PLDI ’96 • Public release – real users, mentioned in C FAQ, C Unleashed, Linux Journal, etc. • Students (includes other PL/SE/security related projects): – David Larochelle: buffer overflows, automatic annotations – Joel Winstead: differential program analysis – Greg Yukl: source code generation • Current Funding: NASA (joint with John Knight) 23 Sept 2002 David Evans - CS696 12 Programming the Swarm 23 Sept 2002 David Evans - CS696 13 Really Brief History of Computer Science 1950s: Programming in the small... Programmable computers Learned the programming is hard Birth of higher-order languages Tools for reasoning about trivial programs 1970s: Programming in the large... Abstraction, objects Methodologies for development Tools for reasoning about component-based systems 2000s: Programming the Swarm! 23 Sept 2002 David Evans - CS696 14 What’s Changing • Execution Platforms – Small, cheap and unreliable – Limited power – communication is expensive • Execution environment – Interact with physical world – Unpredictable, dynamic • Programs – Old style of programming won’t work – Is there a new paradigm? 23 Sept 2002 David Evans - CS696 15 Programming the Swarm: Long-Range Goal Cement 10 GFlop 23 Sept 2002 David Evans - CS696 16 Why this Might be Possible? • We are surrounded by systems that: – Contain 50 Trillion (5 * 1013) components – Continue to function when 50 million components fail every second – Survive in hostile environments (even Canada!) – Self-organize starting from a single component and a program that is smaller than WindowsXP 23 Sept 2002 David Evans - CS696 17 A Biological Programming Model Selvin George • Program systems the way biology does • Literal interpretation: – Cells can change state (genes turn on and off) – Cells can divide • Asymmetrically – Cells can communicate over short distances • Chemical diffusion 23 Sept 2002 David Evans - CS696 18 Example Cell Program state s1 { transitions -> (s1, s1) normal; } 23 Sept 2002 David Evans - CS696 19 Cell Programs • Use chemicals to control development • How can we produce cell programs that generate particular structures? • How can we reason about the behavior of cell programs in the presence of failures and randomness? • How can we describe cell programs at a higher level? (Making abstractions) 23 Sept 2002 David Evans - CS696 20 Less Literal Interpretation • Learn about self-organization and robustness by mimicking biology – Learn principles from biology, not programs • Use this to build real systems – Sensor networks – Distributed file sharing 23 Sept 2002 David Evans - CS696 21 Sensor Networks High-power base station Thousands of small, low-powered devices with sensors and actuators, communicating wirelessly 23 Sept 2002 David Evans - CS696 22 Sensor Networks High-power base station Compromised Node! Enemy base station 23 Sept 2002 David Evans - CS696 23 Security for Sensor Networks • Control Messages – Only messages from base station (or other nodes) should change device behavior • Data Collection – A few compromised nodes should not be able to prevent or tamper with data collection • Data Confidentially – Some applications: eavesdropper shouldn’t be able to interpret messages 23 Sept 2002 David Evans - CS696 24 Why security for sensor networks is hard • Low power devices – Cannot do traditional public-key algorithms • Limited device communication – Sending messages is extremely expensive • Communication is wireless – All messages are vulnerable to eavesdropping and forgery • Devices start identical – no stored secrets 23 Sept 2002 David Evans - CS696 25 Asymmetric Cryptography • Cryptography depends either on: – Shared secrets – Asymmetry (normally or information) • Exploit time and space asymmetries – Public-key systems get asymmetry by only one party knowing private key – In sensor networks, we can get asymmetry by using time (key is revealed later, but in a verifiable way) and space (only nodes within a certain distance can hear) 23 Sept 2002 David Evans - CS696 26 Non-Cryptographic Techniques • Redundancy – Lots of sensors, only a few will be compromised or bogus • Snooping – Because communication is wireless, nodes can hear what their neighbors are saying – If they are lying, tattle tale! 23 Sept 2002 David Evans - CS696 27 Programming the Swarm swarm.cs.virginia.edu • Students: – Selvin George: Biological Programming Model – Undergraduates: Keen Browne, Jacques Fournier, Chris Frost, Ami Malaviya, Jon McCune • Funding: NSF Career Award, NSF ITR 23 Sept 2002 David Evans - CS696 28 Summary • Programming the Swarm: Describing and reasoning about behavior of large ad hoc collections in hostile environments • Splint: Detecting differences between what programs express and what programmers intend • Be proactive about finding an advisor – Most important decision you will make in grad school – Matching process is last resort • Email to arrange meetings: evans@cs.virginia.edu 23 Sept 2002 David Evans - CS696 29