Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen Joint work with

advertisement
Flexible Hardware Acceleration for
Instruction-Grain Program Monitoring
Shimin Chen
Joint work with
Michael Kozuch1, Theodoros Strigkos2, Babak Falsafi3,
Phillip B. Gibbons1, Todd C. Mowry1,2, Vijaya Ramachandran4,
Olatunji Ruwase2, Michael Ryan1, Evangelos Vlachos2
1Intel
Research Pittsburgh
2CMU
3EPFL
4UT
Austin
Instruction-Grain Monitoring
• Software often contain bugs
– Memory corruptions, data races, …, crashes
– Security attacks often designed to exploit bugs
• Instruction-grain lifeguards can help
– Dynamic monitoring: during application execution
– Instruction-grain: e.g., memory access, data flow
• Enables a wide range of powerful lifeguards
Application
Lifeguard
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
2
Example Instruction-Grain Lifeguards
• AddrCheck:
– Monitor malloc/free, memory accesses
– Check if all memory accesses visit allocated memory regions
[Nethercote’04]
• MemCheck: AddrCheck + check uninitialized values
– Copying partially uninitialized structures is not an error
– Lazy error detection to avoid many false positives
– Track propagation of uninitialized values
[Nethercote & Seward ’03 ’07]
• TaintCheck: detect overwrite-based security exploits
– Tainted data: data from network or disk
– Track propagation of tainted data to detect violations
[Newsome & Song’05]
• LockSet: detect data races in parallel programs
[Savage et al.’97]
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
3
Performance
Good
Design Space of Support Platform
Lifeguard-specific
hardware
This paper
Poor
[Crandall & Chong’04], [Dalton et al’07],
[Shetty et al’06], [Shi et al’06], [Suh et
[Chen et al’06]
al’04], [Venkataramani’07],
[Corliss’03]
[Venkataramani’08], [Zhou
et al’07]
[Bruening’04]
[Luk et al’05]
[Nethercote’04]
Specific Lifeguard
General-Purpose HW
improving DBI
3-8X slowdowns
Dynamic binary
instrumentation (DBI)
10-100X slowdowns
General Purpose: Wide
Range of Lifeguards
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
4
Outline
• Introduction
• Background
• Three Hardware Acceleration Techniques
• Experimental Evaluation
• Conclusion
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
5
Example Lifeguard: TaintCheck
[Newsome & Song’05]
Purpose: detect overwrite-based security exploits
–
–
–
–
Metadata kept for application memory and registers
Tainted data: data from network or disk
Track taint propagation
Detect violation: e.g., tainted jump target address
Application
TaintCheck Lifeguard
mov %eax  A
mov B  %eax
taint(%eax) = taint(A)
taint(B)
= taint(%eax)
add %ebx  D
taint(%ebx)|= taint(D)
Detect
exploit
jmp *(F)
if (taint(F)==1) error;
before
attack code
takes
Flexible control
Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
6
TaintCheck w/ Detailed Tracking
TaintCheck:
– Detect violation
– 1 taint bit / application byte
TaintCheck w/ detailed tracking:
[Newsome & Song’05]
– Construct taint propagation trail
– More detailed metadata per application location
• PC of Instruction that tainted this location
• “tainted from” address
Input
Violation
• Not supported by previous lifeguard-specific HW
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
7
Instruction-Grain Lifeguard
Metadata Characteristics
• Organization varies
– per application byte/word
– size, format, semantics vary greatly
• Frequently updated
– e.g., propagation tracking
• Frequently checked
– e.g., memory accesses
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
8
Lifeguard Support
Application
(unmodified)
Events
Lifeguard (software)
Event Handlers
Rare
rare
events
Rare
e.g., malloc/free, system calls
metadata
Frequent
e.g., memory access,
data movement
Update
2
1
Check
3
Event-capture and delivery
General-Purpose HW improving
DBI
Performance bottlenecks:
metadata mapping, updates, and checks
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
9
Our Contributions
Application
(unmodified)
Events
Lifeguard (software)
Event Handlers
Rare
rare
events
Rare
e.g., malloc/free, system calls
metadata
Frequent
Update
e.g., memory access,
data movement
IT
M-TLB
Check
IF
Event-capture and delivery
• Metadata-TLB
• Inheritance Tracking
• Idempotent Filters
for metadata mapping
for metadata updates
for metadata checks
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
10
Outline
• Introduction
• Background
• Three Hardware Acceleration Techniques
– Metadata-TLB
– Inheritance Tracking
– Idempotent Filters
• Experimental Evaluation
• Conclusion
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
11
Metadata-TLB: Motivation
• Metadata per app byte/word
metadata
– Element size may vary
• Two-level structure:
– Robustness & space efficiency
• Mapping: application
Level-1
index
Level-2
chunks
address  metadata address
– Frequently used in almost every handler
– Can be very costly
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
12
Example (TaintCheck)
void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg
/*%edx */)
// app instruction type: dest_reg  dest_reg op mem(src_addr)
// handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)
map *mp = level1_index[src_addr>>16];
mov %eax, %ecx
shr $16, %ecx
mov level1_index(,%ecx,4),%ecx
int idx = (src_addr & 0xffff)>>2;
and $0xffff, %eax
shr $2, %eax
UChar mem_taint = mp[idx];
movzbl (%ecx,%eax,1), %eax
reg_taint[dest_reg] |= mem_taint;
or %al, reg_taint(%edx)
nlba ();
Metadata Mapping
takes 5 out of 8
instructions !
nlba
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
13
Our Solution: Metadata-TLB
• A TLB-like HW associative lookup table
• LMA (Load Metadata Address) instruction:
– Application address  lifeguard metadata address
• Managed by (user-mode) lifeguard software
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
14
Example (TaintCheck) w/ M-TLB
void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg
/*%edx */)
// app instruction type: dest_reg  dest_reg op mem(src_addr)
// handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)
map *mp = level1_index[src_addr>>16];
mov %eax, %ecx
shr $16, %ecx
mov level1_index(,%ecx,4),%ecx
int idx = (src_addr & 0xffff)>>2;
UChar *p = LMA_macro(src_addr);
LMA %eax, %ecx
and $0xffff, %eax
shr $2, %eax
UChar mem_taint = mp[idx];
movzbl (%ecx,%eax,1), %eax
reg_taint[dest_reg] |= mem_taint;
or %al, reg_taint(%edx)
nlba ();
nlba
UChar mem_taint = *p;
mov (%ecx), %al
reg_taint[dest_reg] |= mem_taint;
or %al, reg_taint(%edx)
Reduce handler size
nlba by half !
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
nlba ();
15
Inheritance Tracking: Motivation
• Propagation tracking is expensive
– Metadata updates for almost every app instruction
• Previous hardware solutions
track propagation
– automatically update metadata in hardware
– Problem: only support simple metadata semantics
• e.g., do not support TaintCheck w/ detailed tracking
Input
Violation
• Our goal: flexibility AND performance
• Idea: inheritance structure is common, so let’s
track inheritance in hardware!
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
16
Problem with General Inheritance
Tracking
Application
Propagation
Tracking
mov %eax  A
mov B  %eax
taint(%eax) = taint(A)
taint(B)
= taint(%eax)
add %ebx  D
taint(%ebx) |= taint(D)
Inheritance
Tracking
%eax inherits from
A
B inherits from
insert
%eax D into
%ebx’s inherit-from
list
Problem: state explosion for binary operations !
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
17
Unary Inheritance Tracking
• Many lifeguards can take advantage of unary IT:
– MemCheck
– TaintCheck
check
known
check
• Large performance improvements if used
– Can be disabled if unary IT does not match the lifeguard
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
18
Tracking Register Inheritance
State Transition
& Event to Deliver
Original
event
IT(%rs)
Transformed
event
IT(%rd)
Deliver
event
IT table for
registers
More details in the paper:
• IT table and state transition table details
• Conflict detection
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
19
Example
Application
Before
Inheritance
Tracking
mov %eax  A
mov B  %eax
mem_to_reg
reg_to_mem
mem_to_mem
mov %ebx  C
add %ebx  D
mov E  %ebx
mem_to_reg
dest_reg_op_mem
reg_to_mem
imm_to_mem
Can significantly reduce metadata update events!
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
20
Idempotent Filters: Idea
• Typically, metadata checks give the same result if
– Event parameters are the same and
– Metadata are the same
• Idea: filter out idempotent (redundant) events
• For example:
– AddrCheck:
• After checking that a memory location is allocated
• Subsequent loads/stores to the same location are safe
• Until the next free() event
– LockSet: (surprisingly)
• In between synchronization events (e.g., lock/unlock)
• Check first load to a location
• Check first store to a location
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
21
Outline
• Introduction
• Background
• Three Hardware Acceleration Techniques
• Experimental Evaluation
– Log-Based Architectures (LBA)
– Simulation Study (w/ reduced input sets)
– PIN-based Analysis (w/ full inputs)
• Conclusion
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
22
Log-Based Architectures
Application
(unmodified)
Events
Lifeguard (software)
Event Handlers
Rare
e.g., malloc/free, system calls
rare
events
Rare
metadata
Frequent
Update
e.g., memory access,
data movement
Check
Event-capture and delivery
Log-Based Architecture
(LBA)
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
23
Idea: Exploiting Chip Multiprocessors
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
LBA components
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
24
Simulation Setup: Dual-Core LBA System
Application
Lifeguard
Operating System: Fedora Core 5
Extend
Virtutech
Simics
Core 1
capture
Log Transport
Core 2
dispatch
M-TLB
(e.g. L2 cache)
Compress
decompress
IT & IF
• Application and lifeguard are processes
• Application is stalled when log buffer is full
• Model a 2-level cache hierarchy
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
25
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
LBA baseline
Slowdown =
Avg
vpr
vortex
twolf
parser
mcf
gzip
gcc
gap
eon
crafty
1.36X
bzip2
slowdowns
Overall Performance: TaintCheck
LBA optimized
application execution time w/ lifeguard
application execution time w/o lifeguard
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
26
Applying Our Techniques One by One
AddrCheck MemCheck
7.80
TaintCheck TaintCheck LockSet
w/ detailed
tracking
6.05
• IT, IF, and M-TLB are indeed complementary
• Achieve dramatically better performance
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
MTLB
1.40
BASE
MTLB+IT
MTLB
1.51
MTLB+IF
2.71
BASE
MTLB+IT
3.36
2.29
1.36
MTLB
MTLB+IT+IF
MTLB+IT
MTLB
BASE
MTLB+IF
MTLB
1.90
1.02
4.25
3.20
4.21
BASE
3.81
3.27
3.23
BASE
average slowdowns
10.0
9.0
8.0
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
27
vpr
vortex
twolf
parser
mcf
gzip
gcc
gap
eon
crafty
100
90
80
70
60
50
40
30
20
10
0
bzip2
reduced update events (%)
PIN-Based Analysis: IT
• IT removes 35.8% to 82.0% of the propagation
events
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
28
PIN-Based Analysis: IF
80
70
60
50
40
30
20
10
0
fully-assoc
16-way
8-way
4-way
2-way
1-way
8 16 32 64 128 256
number of filter entries
LockSet
reduced check events (%)
reduced check events (%)
AddrCheck
80
70
60
50
40
30
20
10
0
8
16 32 64 128 256
number of filter entries
• IF can effectively reduce check events
• 4-way works as well as fully-associative
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
29
Conclusion
• Our focus: Instruction-Grain Lifeguards
• Three complementary hardware techniques:
– Metadata-TLB (M-TLB)
– Inheritance Tracking (IT)
– Idempotent Filters (IF)
• Flexible to support a wide range of lifeguards
– Reducing overheads by 2-3X in our experiments
– Achieving 2-51% overheads for all but MemCheck
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
30
Thank you!
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
31
People Working on LBA Project
Intel Research:
• Shimin Chen
• Phillip B. Gibbons
University Faculty:
• Babak Falsafi (EPFL)
• Todd C. Mowry (CMU)
CMU Students:
• Michelle Goodstein
• Olatunji Ruwase
• Mike Kozuch
• Michael Ryan
• Vijaya Ramachandran (UT Austin)
• Theodoros Strigkos
• Evangelos Vlachos
Previous Contributors:
• Bin Lin (Northwestern)
• Limor Fix (IRP)
• Radu Teodorescu (UIUC)
• Steve Schlosser (IRP)
• Anastasia Ailamaki (CMU)
• Greg Ganger (CMU)
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Shimin Chen
32
Download