Enabling Trusted Software Integrity Darko Kirovski Microsoft Research r Milenko Drinić Miodrag Potkonjak Computer Science Department University of California, Los Angeles Problem Description BUFFER ATTACK CODE LOCAL VARIABLES HIGH PRIORITY PROCESS RETURN ADDRESS LOW PRIORITY PROCESS LOW HIGH DATA Buffer Overrun Goal – Explore improperly implemented I/O – Divert execution to attack code Simplest variant – Stack smashing – “Smashing The Stack For Fun And Profit” by Aleph One (aleph1@underground.org), Phrack 49, 1996. Numerous variants explore different vulnerabilities – Tutorials on the Web with bug descriptions – setuid() – Chen, Wagner, Dean, 2002. What Can Be Done? StackGuard – Cowan et al., 1998 – Dummy value next to return address Bounds checking for all pointers – Jones, Kelly, 1995 – Slow in pointer-intensive software Static analysis – Wagner, 2000 – Verify all buffers – promising idea – Too many false alarms – Need to be resolved manually Intrusion Prevention Current approaches – Intrusion detection PREVENT rather than DETECT is easier Intrusion prevention system – Adversary must solve a computationally difficult task to run programs in high priority Two types of binaries – Ordinary – Touched with a security wand Run-time verification Outline How the system works? Software installation Example of constraint embedding Run-time verification How to break the system? Effect on performance Outline How the system works? Software installation Example of constraint embedding Run-time verification How to break the system? Effect on performance An Intrusion Prevention System TRUSTED MODE PUBLIC MODE Software Runs only trusted processes: OS + user defined. Full or controlled access. Executes any code. Restricted access to resources. Script interpreters, distrusted programs, P2P networking, etc. INSTALLATION MODE Single process. Interrupts disabled. Input = software. Output = software with additional CPUID-dependent constraints. Atomic execution unit. Keyed MAC Software trusted Abort Run Keyed MAC CPUID Burnt-in. Not a privacy issue, because it is never revealed externally. Outline How the system works? Software installation Example of constraint embedding Run-time verification How to break the system? Effect on performance Software Installation chip or on an EPROM with verified contents Single process I/O – memory mapped Interrupts disabled Used registers, memory overwritten ~ BOOT on PCs Software master-copy I-block SPEF Installation TI hash Domain ordering Software working-copy Constraint embedding Encrypt (3DES) CPUID Random bitstream Installer is on- I-block GOAL: embed constraints w/o revealing CPUID. Outline How the system works? Software installation Example of constraint embedding Run-time verification How to break the system? Effect on performance Example: Instruction Scheduling SPEF - Instruction Rescheduling Software master-copy Domain ordering TI hash ADD... MOV... SUB... MOV... MOV... DIV... MULT... XOR... SUB... JUMP... ADD... MOV... SUB... MOV... MOV... DIV... MULT... XOR... SUB... JUMP... Encrypt (3DES) CPU ID Random bitstream Constraint embedding verification Domain ordering DIV... MULT... SUB... MOV... MOV... XOR... SUB... ADD... MOV... JUMP... How the Bitstream Reorders Ops? Possible positions Instructions (1) (2) (3) (4) (5) (6) 0x0080e0 0x0080e4 0x0080e8 0x0080ec 0x0080f0 0x0080f4 LDR LDR MOV MUL MOV LDR r1,[r8,#0] r0,[r9,#0] r3,r5 r2,r0,r1 r1,#1 r0,[r6,#0] a) Initial order of instructions and their possible positions Control step encoding 1 2 3 4 5 Available instructions 00 (1) (1) (1)* (4) (5) (1) (2) (3) (4) (5) (6) initial position possible position conditional possible position Part of bitstream used Selected instruction 10 1 0 (3) (2) (1) (4) (5) 01 10 11 (2) (3) (2) (4)* (5)* (6)* (6) - (1) (3) (2) (4) (5) (6) b) Dependency graph Instructions 0x0080e8 0x0080e4 0x0080e0 0x0080ec 0x0080f0 0x0080f4 MOV LDR LDR MUL MOV LDR r3,r5 r0,[r9,#0] r1,[r8,#0] r2,r0,r1 r1,#1 r0,[r6,#0] e) Final order of instructions d) Instruction ordering procedure 1010...0110 c) Sample bit-stream Constraint Embedding Techniques Entropy of program representation is high Reduce entropy w/ constraints for 50+ bits with preserved performance Exact entropy reduction unique for each CPUID Constraint types – Requirements • High entropy • Functional transparency • Transformation invariance • Effective implementation • Low performance overhead – Examples • Instruction rescheduling • Register assignment • Basic block reordering • Conditional branch selection • Filling unused opcode fields • Toggling signs of operands Outline How the system works? Software installation Example of constraint embedding Run-time verification How to break the system? Effect on performance Run-time Code Verification ARM instruction set and simulated Software working-copy I-block CPU + SPEF Verifier CPU ID TI hash I-block buffer Random bitstream Encrypt (3DES) Domain ordering Constraint verification Traditional CPU architecture ABORT or RUN Cache line system 50 cycles 20K gates HW support? Outline How the system works? Software installation Example of constraint embedding Run-time verification How to break the system? Effect on performance How to Break the System? Cryptographically secure keyed MAC – Hard to extract CPUID from working-copies – Hard to create an I-block with CPUID constraints satisfied w/o the CPUID Patch low entropy instruction blocks – I-block with low entropy? Example: • I-block = one instruction and all other NOPS – Hardware must detect I-blocks with low entropy • Count and limit domain cardinality • Done during domain ordering Patch I-blocks from working copies – Difficult? Hard to evaluate w/o a lot of software Outline How the system works? Software installation Example of constraint embedding Run-time verification How to break the system? Effect on performance Simulated w/ ARMulator ARM instruction set MediaBench suite Performance Embedded bits of entropy Mpeg Encode Mpeg Decode 343 blocks Mean: 136 75 60 Jpeg Encode 242 blocks Mean: 142 45 100 Jpeg Decode 330 blocks Mean: 152 75 160 Pegwit 380 blocks Mean: 146 120 60 30 50 80 30 25 15 25 40 15 0 0 0 35 100 200 300 0 0 68 100 200 300 0 0 21 100 200 300 216 blocks Mean: 140 45 50 0 0 22 100 200 300 0 56 100 200 Cummulative Degrees of Freedom For All Constraint Types 18 Performance effect – 13-25% overhead – 7-17% with a cache that logs TI-hashes Effective CPI 64-Instruction Block Count 100 16 No Verification 14 Verification without TIH cache 12 Verification with TIH cache 10 8 6 4 2 0 1K, FA 1K, DM 2K, FA 2K, DM Cache size 4K, FM 4K, DM 300 Summary Intrusion prevention On-line software verification for authenticity Keyed message authentication code – Stored as footer – Stored as constraints •50% decrease in code size overhead Public and trusted execution mode Relatively hi/lo performance overhead – No hardware acceleration – 20% - sets back Moore’s Law 4.5 months