GENESIS: A Framework For Achieving Component Diversity John C. Knight, Jack W. Davidson, David Evans, Anh Nguyen-Tuong University of Virginia Chenxi Wang Carnegie Mellon University Nice Meeting Facility! DARPA SRS Kickoff 2 What Is The Problem? Many machines with the same vulnerability What is a vulnerability? A vulnerability is a fault in the classic sense of dependability theory Fault types: Degradation Design something breaks in one copy flaw in design affects all copies Software faults are design faults DARPA SRS Kickoff 3 Redundancy & Degradation Faults Identical Computers Computer1 Inputs Computer2 Damage Assessment Error Detection Voter Outputs State Restoration Continued Service ComputerN N Modular Redundant (NMR) System DARPA SRS Kickoff 4 Redundancy & Design Faults Redundancy is diversity Works well for degradation faults: Faults have predictable statistical behavior Effective mathematical models available What about design faults? Simple replication doesn’t work, obviously Requires different (diverse) designs to be effective DARPA SRS Kickoff 5 Multiple Systems Vulnerabilities Linux Windows OS/2 Specification DARPA SRS Kickoff 6 Design Diversity Development Interaction Barriers Technology Restrictions Version Development 1 Version Development 2 Component Specification System Assembly Version Development N Goal: Different Faults Because Of Independent Development DARPA SRS Kickoff 7 Design Diverse System How “Different”? Version1 Inputs Version2 N Version System Voter Outputs VersionN Assumption: Different Faults Because Of Independent Development DARPA SRS Kickoff 8 Design Diversity Does not work well for design faults No No No No upper bound on failure probability practical statistical models definition of “design diversity” procedure for achieving it Linux vs. Windows is, however, worse—it is purely ad hoc But, what else is there? DARPA SRS Kickoff 9 DARPA SRS Kickoff 10 Data Diversity Heisenbug (Jim Gray): Program fails Sometimes if you rerun the program, it works Applied to Tandem operating system We all do this in daily operation Several variants of approach developed Comprehensive, general approach developed: Data diversity DARPA SRS Kickoff 11 Data Diverse System N Copy Architecture Same Software Inputs Data Reexpression Copy1 Reverse Data Reexpression Data Reexpression Copy2 Reverse Data Reexpression Data Reexpression CopyN Reverse Data Reexpression DARPA SRS Kickoff Voter 12 Data Diversity Low cost—software is copied Unknown performance for design faults Experimental evidence that it works well Can be very powerful: sin(x)= sin(a + b) = sin(a)cos(b) + cos(a)sin(b) = sin(a)sin(90-b) + sin(90-a)sin(b) Choose a and b, repeat, vote DARPA SRS Kickoff 13 The Vision Diversity Specifications GENESIS Diversity Engine Software Diverse population of functionallyequivalent software Automated production of design-diverse, functionally-equivalent software Automatic production of data-diverse, functionally-equivalent software It might work… DARPA SRS Kickoff 14 Overall Approach Analysis of the diversity space Automated production of functionally-equivalent software and data: Compiler and meta-compiler technology: Diversity Specifications GENESIS Diversity Engine Software Virtual Machine Technology Source-level transformations Compiler transformations Data stream rewriting Run-time software translation techniques Rationale that diversity is an effective defense mechanism: Diverse population of functionallyequivalent software Experimental evaluation Modeling of effects of diversity on known vulnerabilities Application to COTS software DARPA SRS Kickoff 15 Hierarchic Design Diversity Source-toSource Transformations Software Application Compiler Transformations Source Code Version N Source Code Version 1 Binary 1 Binary 2 2 2 2 Binary i 2 2 1 1 Binary i Binary i 2 1 2 1 1 2 2 1 Binary i 2 2 1 2 2 Run-time Transformations DARPA SRS Kickoff 16 Source to Source Transformations Underlying model of tasks: Process interaction: e.g. low-level semaphores vs. higher-level monitors Fundamental libraries: e.g. fork/execs vs. threads e.g. libc, sockets, etc… Diversity achieved by component combinations DARPA SRS Kickoff 17 Compiler Transformations Generate N compilers that target different architectures Manipulate formal description of target architecture—Computer Systems Description Language (CSDL): Instruction Set Architecture (ISA) specification Calling convention specification Example diversity techniques: Different calling conventions ISA subsets created, enforced dynamically Memory layouts—code and data Implement the above within the same program DARPA SRS Kickoff 18 Run-time Transformations Software Dynamic Translation STRATA system: Layer between hardware and application Designed to be easily retargeted Virtual machine provides: Underlying target Supplementary rules on use of target Software Dynamic Translation systems: FX 32 Dynamo Transmeta DARPA SRS Kickoff 19 STRATA—Basic Operation SDT Virtual Machine Context Capture New PC Cached? New Fragment Enforce Desired Policies Yes Fetch Decode Translate Context Switch Finished? Next PC Yes No Host CPU (Executing Translated Code from Cache) DARPA SRS Kickoff 20 Example STRATA Policies Apply compile-time transformations dynamically: Dynamic injection and enforcement of behavioral policies Rearrangement basic blocks, calling sequence transformations, etc… E.g. resource usage (files, sockets, tasks) Language diversity: dialects Only allow subsets of original instruction set Vary subsets dynamically DARPA SRS Kickoff 21 STRATA System Architecture Application Context Management Memory Management Machine Independent Components Strata Virtual CPU Cache Management Target Interface Strata Virtual Machine Linker Target Specific Functions Host CPU DARPA SRS Kickoff 22 Diversity in the data space can avoid sequences of events that lead to failure Diversity space offers large range of data re-expression options Precision (Exact, Approximate) Locality (Internal, External) Sequence (inorder-ontime, inorder-offtime, outoforderontime, outoforder-offtime) DARPA SRS Kickoff Sequence Data Diversity c o L ty i l a Precision 23 Data Re-expression Examples Change floating point values: Data sequences: Lose precision Translate Rotate Reorder data Change timing of data Memory layout (code and data) Reorder transactions Reorder data in activation records SQL Rewriting …many more examples… DARPA SRS Kickoff 24 Data Re-expression Space These examples are ad hoc Proposals in literature are ad hoc So: Use data re-expression space categorization to drive exploration of diversity techniques (instead of point solutions) DARPA SRS Kickoff 25 Evaluation Theoretical: Modeling of effects of diversity on network vulnerabilities Understand limits of diversity Categorization of “diversity space” Identify unnecessary homogeneity in software E.g., WORM propagation Not just code but also environment, configuration, etc… Experimental: Directed fault seeding: Apply known exploits to target system Apply all Genesis techniques Evaluate variants’ resistance to attack Automated fault seeding DARPA SRS Kickoff 26 Automatic Fault Seeding Need test cases Need typical vulnerabilities, i.e., bugs Can typical bugs be synthesized? Prior work on syntactic transformations: Simple mutations Wide variety of resilience Defects created with excellent statistical properties Plan to try this route DARPA SRS Kickoff 27 Automated Fault Seeding Target Software System Target Target Software Target Target Software Target Target System Software Software Error Acceptance Target Target System Software Software Target Target System System Seeding Tests Software Software Target System System Software Software System System Software System System System Genesis Transformations Vulnerability Assessment Target Target Software Target Software Target System Software System Software System System Target Target Software Target Software Target System Software System Software System System DARPA SRS Kickoff Target Target Software Target Software Target System Software System Software System System 28 State Of The Implementation Exists, ready to use: CSDL Calling convention spec STRATA DARPA SRS Kickoff 29 Specific Questions Posed What you are trying to do (the problem you are addressing)? How will you show that you were successful? What are the implications of successful results (or less than successful results)? What is your technical approach? What is new, or hasn’t been attempted? What significant problems do you anticipate, what makes your project difficult and how do you plan to approach the difficulties? If successful, what have you thought about regarding transitioning the technology? If successful, what would be next? DARPA SRS Kickoff 30 Practical Problem If this works: Building a system will require lots of computer time Lots of systems will require LOTS of computer time But it is just computer time Will not be able to just press CDs Will require a substantial engineering investment DARPA SRS Kickoff 31 Summary Automatic application of design diversity: Systematic application of data diversity: Macro, midi, micro Internal, external, all dimensions Seamless integration of the two Evaluation and assessment: Directed fault seeding Automated fault seeding Questions? DARPA SRS Kickoff 32