Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen Joint work with Michael Kozuch1, Theodoros Strigkos2, Babak Falsafi3, Phillip B. Gibbons1, Todd C. Mowry1,2, Vijaya Ramachandran4, Olatunji Ruwase2, Michael Ryan1, Evangelos Vlachos2 1Intel Research Pittsburgh 2CMU 3EPFL 4UT Austin Instruction-Grain Monitoring • Software often contain bugs – Memory corruptions, data races, …, crashes – Security attacks often designed to exploit bugs • Instruction-grain lifeguards can help – Dynamic monitoring: during application execution – Instruction-grain: e.g., memory access, data flow • Enables a wide range of powerful lifeguards Application Lifeguard Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 2 Example Instruction-Grain Lifeguards • AddrCheck: – Monitor malloc/free, memory accesses – Check if all memory accesses visit allocated memory regions [Nethercote’04] • MemCheck: AddrCheck + check uninitialized values – Copying partially uninitialized structures is not an error – Lazy error detection to avoid many false positives – Track propagation of uninitialized values [Nethercote & Seward ’03 ’07] • TaintCheck: detect overwrite-based security exploits – Tainted data: data from network or disk – Track propagation of tainted data to detect violations [Newsome & Song’05] • LockSet: detect data races in parallel programs [Savage et al.’97] Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 3 Performance Good Design Space of Support Platform Lifeguard-specific hardware This paper Poor [Crandall & Chong’04], [Dalton et al’07], [Shetty et al’06], [Shi et al’06], [Suh et [Chen et al’06] al’04], [Venkataramani’07], [Corliss’03] [Venkataramani’08], [Zhou et al’07] [Bruening’04] [Luk et al’05] [Nethercote’04] Specific Lifeguard General-Purpose HW improving DBI 3-8X slowdowns Dynamic binary instrumentation (DBI) 10-100X slowdowns General Purpose: Wide Range of Lifeguards Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 4 Outline • Introduction • Background • Three Hardware Acceleration Techniques • Experimental Evaluation • Conclusion Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 5 Example Lifeguard: TaintCheck [Newsome & Song’05] Purpose: detect overwrite-based security exploits – – – – Metadata kept for application memory and registers Tainted data: data from network or disk Track taint propagation Detect violation: e.g., tainted jump target address Application TaintCheck Lifeguard mov %eax A mov B %eax taint(%eax) = taint(A) taint(B) = taint(%eax) add %ebx D taint(%ebx)|= taint(D) Detect exploit jmp *(F) if (taint(F)==1) error; before attack code takes Flexible control Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 6 TaintCheck w/ Detailed Tracking TaintCheck: – Detect violation – 1 taint bit / application byte TaintCheck w/ detailed tracking: [Newsome & Song’05] – Construct taint propagation trail – More detailed metadata per application location • PC of Instruction that tainted this location • “tainted from” address Input Violation • Not supported by previous lifeguard-specific HW Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 7 Instruction-Grain Lifeguard Metadata Characteristics • Organization varies – per application byte/word – size, format, semantics vary greatly • Frequently updated – e.g., propagation tracking • Frequently checked – e.g., memory accesses Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 8 Lifeguard Support Application (unmodified) Events Lifeguard (software) Event Handlers Rare rare events Rare e.g., malloc/free, system calls metadata Frequent e.g., memory access, data movement Update 2 1 Check 3 Event-capture and delivery General-Purpose HW improving DBI Performance bottlenecks: metadata mapping, updates, and checks Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 9 Our Contributions Application (unmodified) Events Lifeguard (software) Event Handlers Rare rare events Rare e.g., malloc/free, system calls metadata Frequent Update e.g., memory access, data movement IT M-TLB Check IF Event-capture and delivery • Metadata-TLB • Inheritance Tracking • Idempotent Filters for metadata mapping for metadata updates for metadata checks Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 10 Outline • Introduction • Background • Three Hardware Acceleration Techniques – Metadata-TLB – Inheritance Tracking – Idempotent Filters • Experimental Evaluation • Conclusion Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 11 Metadata-TLB: Motivation • Metadata per app byte/word metadata – Element size may vary • Two-level structure: – Robustness & space efficiency • Mapping: application Level-1 index Level-2 chunks address metadata address – Frequently used in almost every handler – Can be very costly Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 12 Example (TaintCheck) void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr) map *mp = level1_index[src_addr>>16]; mov %eax, %ecx shr $16, %ecx mov level1_index(,%ecx,4),%ecx int idx = (src_addr & 0xffff)>>2; and $0xffff, %eax shr $2, %eax UChar mem_taint = mp[idx]; movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint; or %al, reg_taint(%edx) nlba (); Metadata Mapping takes 5 out of 8 instructions ! nlba Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 13 Our Solution: Metadata-TLB • A TLB-like HW associative lookup table • LMA (Load Metadata Address) instruction: – Application address lifeguard metadata address • Managed by (user-mode) lifeguard software Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 14 Example (TaintCheck) w/ M-TLB void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr) map *mp = level1_index[src_addr>>16]; mov %eax, %ecx shr $16, %ecx mov level1_index(,%ecx,4),%ecx int idx = (src_addr & 0xffff)>>2; UChar *p = LMA_macro(src_addr); LMA %eax, %ecx and $0xffff, %eax shr $2, %eax UChar mem_taint = mp[idx]; movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint; or %al, reg_taint(%edx) nlba (); nlba UChar mem_taint = *p; mov (%ecx), %al reg_taint[dest_reg] |= mem_taint; or %al, reg_taint(%edx) Reduce handler size nlba by half ! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen nlba (); 15 Inheritance Tracking: Motivation • Propagation tracking is expensive – Metadata updates for almost every app instruction • Previous hardware solutions track propagation – automatically update metadata in hardware – Problem: only support simple metadata semantics • e.g., do not support TaintCheck w/ detailed tracking Input Violation • Our goal: flexibility AND performance • Idea: inheritance structure is common, so let’s track inheritance in hardware! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 16 Problem with General Inheritance Tracking Application Propagation Tracking mov %eax A mov B %eax taint(%eax) = taint(A) taint(B) = taint(%eax) add %ebx D taint(%ebx) |= taint(D) Inheritance Tracking %eax inherits from A B inherits from insert %eax D into %ebx’s inherit-from list Problem: state explosion for binary operations ! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 17 Unary Inheritance Tracking • Many lifeguards can take advantage of unary IT: – MemCheck – TaintCheck check known check • Large performance improvements if used – Can be disabled if unary IT does not match the lifeguard Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 18 Tracking Register Inheritance State Transition & Event to Deliver Original event IT(%rs) Transformed event IT(%rd) Deliver event IT table for registers More details in the paper: • IT table and state transition table details • Conflict detection Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 19 Example Application Before Inheritance Tracking mov %eax A mov B %eax mem_to_reg reg_to_mem mem_to_mem mov %ebx C add %ebx D mov E %ebx mem_to_reg dest_reg_op_mem reg_to_mem imm_to_mem Can significantly reduce metadata update events! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 20 Idempotent Filters: Idea • Typically, metadata checks give the same result if – Event parameters are the same and – Metadata are the same • Idea: filter out idempotent (redundant) events • For example: – AddrCheck: • After checking that a memory location is allocated • Subsequent loads/stores to the same location are safe • Until the next free() event – LockSet: (surprisingly) • In between synchronization events (e.g., lock/unlock) • Check first load to a location • Check first store to a location Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 21 Outline • Introduction • Background • Three Hardware Acceleration Techniques • Experimental Evaluation – Log-Based Architectures (LBA) – Simulation Study (w/ reduced input sets) – PIN-based Analysis (w/ full inputs) • Conclusion Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 22 Log-Based Architectures Application (unmodified) Events Lifeguard (software) Event Handlers Rare e.g., malloc/free, system calls rare events Rare metadata Frequent Update e.g., memory access, data movement Check Event-capture and delivery Log-Based Architecture (LBA) Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 23 Idea: Exploiting Chip Multiprocessors P P P P P P P P P P P P P P P P LBA components Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 24 Simulation Setup: Dual-Core LBA System Application Lifeguard Operating System: Fedora Core 5 Extend Virtutech Simics Core 1 capture Log Transport Core 2 dispatch M-TLB (e.g. L2 cache) Compress decompress IT & IF • Application and lifeguard are processes • Application is stalled when log buffer is full • Model a 2-level cache hierarchy Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 25 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 LBA baseline Slowdown = Avg vpr vortex twolf parser mcf gzip gcc gap eon crafty 1.36X bzip2 slowdowns Overall Performance: TaintCheck LBA optimized application execution time w/ lifeguard application execution time w/o lifeguard Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 26 Applying Our Techniques One by One AddrCheck MemCheck 7.80 TaintCheck TaintCheck LockSet w/ detailed tracking 6.05 • IT, IF, and M-TLB are indeed complementary • Achieve dramatically better performance Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen MTLB 1.40 BASE MTLB+IT MTLB 1.51 MTLB+IF 2.71 BASE MTLB+IT 3.36 2.29 1.36 MTLB MTLB+IT+IF MTLB+IT MTLB BASE MTLB+IF MTLB 1.90 1.02 4.25 3.20 4.21 BASE 3.81 3.27 3.23 BASE average slowdowns 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 27 vpr vortex twolf parser mcf gzip gcc gap eon crafty 100 90 80 70 60 50 40 30 20 10 0 bzip2 reduced update events (%) PIN-Based Analysis: IT • IT removes 35.8% to 82.0% of the propagation events Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 28 PIN-Based Analysis: IF 80 70 60 50 40 30 20 10 0 fully-assoc 16-way 8-way 4-way 2-way 1-way 8 16 32 64 128 256 number of filter entries LockSet reduced check events (%) reduced check events (%) AddrCheck 80 70 60 50 40 30 20 10 0 8 16 32 64 128 256 number of filter entries • IF can effectively reduce check events • 4-way works as well as fully-associative Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 29 Conclusion • Our focus: Instruction-Grain Lifeguards • Three complementary hardware techniques: – Metadata-TLB (M-TLB) – Inheritance Tracking (IT) – Idempotent Filters (IF) • Flexible to support a wide range of lifeguards – Reducing overheads by 2-3X in our experiments – Achieving 2-51% overheads for all but MemCheck Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 30 Thank you! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 31 People Working on LBA Project Intel Research: • Shimin Chen • Phillip B. Gibbons University Faculty: • Babak Falsafi (EPFL) • Todd C. Mowry (CMU) CMU Students: • Michelle Goodstein • Olatunji Ruwase • Mike Kozuch • Michael Ryan • Vijaya Ramachandran (UT Austin) • Theodoros Strigkos • Evangelos Vlachos Previous Contributors: • Bin Lin (Northwestern) • Limor Fix (IRP) • Radu Teodorescu (UIUC) • Steve Schlosser (IRP) • Anastasia Ailamaki (CMU) • Greg Ganger (CMU) Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen 32