Enabling Trusted Software Integrity r Darko Kirovski

advertisement
Enabling Trusted
Software Integrity
Darko Kirovski
Microsoft Research
r
Milenko Drinić
Miodrag Potkonjak
Computer Science Department
University of California, Los Angeles
Problem Description
BUFFER
ATTACK CODE
LOCAL VARIABLES
HIGH PRIORITY
PROCESS
RETURN ADDRESS
LOW PRIORITY
PROCESS
LOW
HIGH
DATA
Buffer Overrun
 Goal
– Explore improperly implemented I/O
– Divert execution to attack code
 Simplest variant – Stack smashing
– “Smashing The Stack For Fun And Profit” by Aleph
One (aleph1@underground.org), Phrack 49, 1996.
 Numerous variants explore different
vulnerabilities
– Tutorials on the Web with bug descriptions
– setuid() – Chen, Wagner, Dean, 2002.
What Can Be Done?
 StackGuard – Cowan et al., 1998
– Dummy value next to return address
 Bounds checking for all pointers – Jones,
Kelly, 1995
– Slow in pointer-intensive software
 Static analysis – Wagner, 2000
– Verify all buffers – promising idea
– Too many false alarms
– Need to be resolved manually
Intrusion Prevention
 Current approaches
– Intrusion detection
 PREVENT rather than DETECT is easier
 Intrusion prevention system
– Adversary must solve a computationally
difficult task to run programs in high priority
 Two types of binaries
– Ordinary
– Touched with a security wand
 Run-time verification
Outline
 How the system works?
 Software installation
 Example of constraint embedding
 Run-time verification
 How to break the system?
 Effect on performance
Outline
 How the system works?
 Software installation
 Example of constraint embedding
 Run-time verification
 How to break the system?
 Effect on performance
An Intrusion Prevention System
TRUSTED MODE
PUBLIC MODE
Software
Runs only trusted processes: OS
+ user defined. Full or controlled
access.
Executes any code.
Restricted access to resources.
Script interpreters, distrusted
programs, P2P networking, etc.
INSTALLATION MODE
Single process. Interrupts
disabled. Input = software.
Output = software with additional
CPUID-dependent constraints.
Atomic
execution
unit.
Keyed
MAC
Software
trusted
Abort
Run
Keyed
MAC
CPUID
Burnt-in. Not a privacy
issue, because it is never
revealed externally.
Outline
 How the system works?
 Software installation
 Example of constraint embedding
 Run-time verification
 How to break the system?
 Effect on performance
Software Installation





chip or on an
EPROM with
verified contents
Single process
I/O – memory
mapped
Interrupts
disabled
Used registers,
memory
overwritten
~ BOOT on PCs
Software
master-copy
I-block
SPEF Installation
TI
hash
Domain
ordering
Software
working-copy
Constraint
embedding
Encrypt
(3DES)
CPUID
Random
bitstream
 Installer is on-
I-block
GOAL: embed constraints
w/o revealing CPUID.
Outline
 How the system works?
 Software installation
 Example of constraint embedding
 Run-time verification
 How to break the system?
 Effect on performance
Example: Instruction Scheduling
SPEF - Instruction Rescheduling
Software
master-copy
Domain
ordering
TI
hash
ADD...
MOV...
SUB...
MOV...
MOV...
DIV...
MULT...
XOR...
SUB...
JUMP...
 ADD...
 MOV...
 SUB...
 MOV...
 MOV...
 DIV...
 MULT...
 XOR...
 SUB...
JUMP...
Encrypt
(3DES)
CPU
ID
Random
bitstream
Constraint
embedding verification
Domain
ordering
 DIV...
 MULT...
 SUB...
 MOV...
 MOV...
 XOR...
 SUB...
 ADD...
 MOV...
JUMP...
How the Bitstream Reorders Ops?
Possible positions
Instructions
(1)
(2)
(3)
(4)
(5)
(6)
0x0080e0
0x0080e4
0x0080e8
0x0080ec
0x0080f0
0x0080f4
LDR
LDR
MOV
MUL
MOV
LDR
r1,[r8,#0]
r0,[r9,#0]
r3,r5
r2,r0,r1
r1,#1
r0,[r6,#0]
a) Initial order of instructions
and their possible positions
Control
step
encoding
1
2
3
4
5
Available
instructions
00
(1)
(1)
(1)*
(4)
(5)
(1)






(2)



(3)




(4)
(5)
(6)










initial position
possible position
conditional possible position
Part of bitstream used
Selected
instruction
10
1
0
(3)
(2)
(1)
(4)
(5)
01 10 11
(2) (3) (2) (4)* (5)* (6)* (6) -
(1)
(3)
(2)
(4)
(5)
(6)
b) Dependency graph
Instructions
0x0080e8
0x0080e4
0x0080e0
0x0080ec
0x0080f0
0x0080f4
MOV
LDR
LDR
MUL
MOV
LDR
r3,r5
r0,[r9,#0]
r1,[r8,#0]
r2,r0,r1
r1,#1
r0,[r6,#0]
e) Final order of instructions
d) Instruction ordering procedure
1010...0110
c) Sample bit-stream
Constraint Embedding
Techniques
 Entropy of program representation is high
 Reduce entropy w/ constraints for 50+ bits with
preserved performance
 Exact entropy reduction unique for each CPUID
 Constraint types
– Requirements
• High entropy
• Functional transparency
• Transformation invariance
• Effective implementation
• Low performance overhead
– Examples
• Instruction rescheduling
• Register assignment
• Basic block reordering
• Conditional branch selection
• Filling unused opcode fields
• Toggling signs of operands
Outline
 How the system works?
 Software installation
 Example of constraint embedding
 Run-time verification
 How to break the system?
 Effect on performance
Run-time Code Verification
 ARM instruction set and simulated
Software
working-copy
I-block
CPU + SPEF Verifier
CPU ID
TI
hash
I-block
buffer
Random
bitstream
Encrypt
(3DES)
Domain
ordering
Constraint
verification
Traditional CPU
architecture
ABORT or RUN
Cache line
system
 50 cycles
 20K gates
 HW support?
Outline
 How the system works?
 Software installation
 Example of constraint embedding
 Run-time verification
 How to break the system?
 Effect on performance
How to Break the System?
 Cryptographically secure keyed MAC
– Hard to extract CPUID from working-copies
– Hard to create an I-block with CPUID constraints
satisfied w/o the CPUID
 Patch low entropy instruction blocks
– I-block with low entropy? Example:
• I-block = one instruction and all other NOPS
– Hardware must detect I-blocks with low entropy
• Count and limit domain cardinality
• Done during domain ordering
 Patch I-blocks from working copies
– Difficult? Hard to evaluate w/o a lot of software
Outline
 How the system works?
 Software installation
 Example of constraint embedding
 Run-time verification
 How to break the system?
 Effect on performance
 Simulated w/ ARMulator
 ARM instruction set
 MediaBench suite
Performance
 Embedded bits of entropy
Mpeg Encode
Mpeg Decode
343 blocks
Mean: 136
75
60
Jpeg Encode
242 blocks
Mean: 142
45
100
Jpeg Decode
330 blocks
Mean: 152
75
160
Pegwit
380 blocks
Mean: 146
120
60
30
50
80
30
25
15
25
40
15
0
0
0 35
100
200
300
0
0
68 100
200
300
0
0 21
100
200
300
216 blocks
Mean: 140
45
50
0
0 22
100
200
300
0
56 100
200
Cummulative Degrees of Freedom For All Constraint Types
18
 Performance effect
– 13-25% overhead
– 7-17% with a cache
that logs TI-hashes
Effective CPI
64-Instruction
Block Count
100
16
No Verification
14
Verification without TIH cache
12
Verification with TIH cache
10
8
6
4
2
0
1K, FA
1K, DM
2K, FA
2K, DM
Cache size
4K, FM
4K, DM
300
Summary
 Intrusion prevention
 On-line software verification for
authenticity
 Keyed message authentication code
– Stored as footer
– Stored as constraints
•50% decrease in code size overhead
 Public and trusted execution mode
 Relatively hi/lo performance overhead
– No hardware acceleration
– 20% - sets back Moore’s Law 4.5 months 
Download