Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009 About Your Instructor Currently – Assistant Professor at University of Virginia – Faculty Consultant at Intel Previously – PostDoc at Intel (2004-2005) – PhD from Harvard (2004) – Four summer internships (HP & IBM) – Worked with Dynamo, Jikes RVM, … Other Interests – Marathons (Boston, NYC, Disney) – Reality TV Shows – Family (8 month old at home!) 1 ACACES 2009 – Process Virtualization About the Course • Day 1 – What is Process Virtualization? • Day 2 – Building Process Virtualization Systems • Day 3 – Using Process Virtualization Systems • Day 4 – Symbiotic Optimization •We’ll use Pin as a case study www.pintool.org •You’ll have homework! 2 ACACES 2009 – Process Virtualization What is Process Virtualization? System virtualization – allows multiple OSes to share the same hardware Process virtualization – runs as a normal application (on top of an OS) and supports a single process App1 OS1 VMM HW App2 OS2 System Virtualization 3 ACACES 2009 – Process Virtualization App1 DBT App2 DBI OS HW Process Virtualization Classifying Virtualization Dynamic binary optimization (x86 x86--) • Complement the static compiler – User inputs, phases, DLLs, hardware features – Examples: DynamoRIO, Mojo, Strata Dynamic translation (x86 PPC) • Convert applications to run on a new architecture – Examples: Rosetta, Transmeta CMS, DAISY Dynamic instrumentation (x86 x86++) • Inspect/add features to existing applications – Examples: Pin, Valgrind 4 ACACES 2009 – Process Virtualization A Simple Example of Instrumentation Inserting extra code into a program to collect runtime information counter++; sub $0xff, %edx counter++; cmp %esi, %edx counter++; jle <L1> counter++; mov $0x1, %edi counter++; add $0x10, %eax 5 ACACES 2009 – Process Virtualization Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out $ pin -t inscount.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out Count 422838 6 ACACES 2009 – Process Virtualization A Simple Example of Optimization On Pentium 3, inc is faster than add On Pentium 4, add is faster than inc sub cmp jle mov inc 7 $0xff, %edx %esi, %edx <L1> $0x1, %edi %eax sub cmp jle mov add ACACES 2009 – Process Virtualization $0xff, %edx %esi, %edx <L1> $0x1, %edi $0x1, %eax Research Applications Computer Architecture Multicore • Thread analysis • Trace Generation – Thread profiling • Fault Tolerance Studies – Race detection • Emulating New Instructions • Cache simulations Program Analysis • Code coverage • Call-graph generation • Memory-leak detection • Instruction profiling 8 Compilers • Compare programs from competing compilers Security • Add security checks and features ACACES 2009 – Process Virtualization Approaches • Source modification: – Modify source programs • Binary modification: – Modify executables directly Advantages for binary modification Language independent Machine-level view Modify legacy/proprietary software 9 ACACES 2009 – Process Virtualization Static vs Dynamic Approaches Dynamic approaches are more robust No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes The Code Discovery Problem on x86 Indirect jump to ?? Instr 1 Instr 2 Instr 3 Jump Data interspersed Reg DATA Instr 5 Instr 6 with code Uncond Branch PADDING Pad for alignment Instr 8 10 ACACES 2009 – Process Virtualization Dynamic Modification: Approaches JIT Mode • Create a modified copy of the application on-the-fly • Original code never executes More flexible, more common approach Probe Mode • Modifies the original application instructions • Inserts jumps to modified code (trampolines) Lower overhead (less flexible) approach 11 ACACES 2009 – Process Virtualization JIT-Mode Binary Modification Generate and cache modified copies of instructions EXE Transform Code Cache Profile Execute Modified (cached) instructions are executed in lieu of original instructions 12 ACACES 2009 – Process Virtualization JIT-Mode Instrumentation Original code Code cache 1’ 1 2 3 5 Exits point back to VMM 2’ 4 7’ 6 7 Fetch trace starting block 1 and start instrumentation 13 ACACES 2009 – Process Virtualization Pin JIT-Mode Instrumentation Original code Code cache 1’ 1 2 3 5 2’ 4 7’ 6 7 14 Transfer control into code cache (block 1) ACACES 2009 – Process Virtualization Pin JIT-Mode Instrumentation Original code Code cache trace linking 1 2 3 5 15 3’ 2’ 5’ 7’ 6’ 4 6 7 1’ Fetch and instrument a new trace ACACES 2009 – Process Virtualization Pin Instrumentation Approaches JIT Mode • Create a modified copy of the application on-the-fly • Original code never executes More flexible, more common approach Probe Mode • Modify the original application instructions • Insert jumps to instrumentation code (trampolines) Lower overhead (less flexible) approach 16 ACACES 2009 – Process Virtualization A Sample Probe • A probe is a jump instruction that overwrites original instruction(s) in the application – Copy/translate original bytes so probed functions can be called Entry point overwritten with probe: Original function entry point: 0x400113d4: 0x400113d5: 0x400113d7: 0x400113d8: 0x400113d9: 17 push mov push push push %ebp %esp,%ebp %edi %esi %ebx 0x400113d4: 0x400113d9: jmp push 0x41481064 %ebx Copy of entry point w/ original bytes: 0x50000004: 0x50000005: 0x50000007: 0x50000008: 0x50000009: ACACES 2009 – Process Virtualization push mov push push jmp %ebp %esp,%ebp %edi %esi 0x400113d9 Probe Instrumentation Advantages: • Low overhead – few percent • Less intrusive – execute original code Disadvantages: • More tool writer responsibility • Restrictions on where to modify (routine-level) 18 ACACES 2009 – Process Virtualization Probe Tool Writer Responsibilities No control flow into the instruction space where probe is placed • 6 bytes on IA32, 7 bytes on Intel64, bundle on IA64 • Branch into “replaced” instructions will fail • Probes at function entry point only Thread safety for insertion/deletion of probes • During image load callback is safe • Only loading thread has a handle to the image Replacement function has same behavior as original 19 ACACES 2009 – Process Virtualization Probe vs. JIT Summary 20 Probes JIT Overhead Few percent 50% or higher Intrusive Low High Granularity Function boundary Instruction Safety & Isolation More responsibility for tool writer High ACACES 2009 – Process Virtualization Process Virtualization Systems Readily Available • DynamoRIO • Valgrind • Pin Available By Request • Strata • Adore Unavailable • Transmeta CMS • Dynamo 21 ACACES 2009 – Process Virtualization DynamoRIO 22 ACACES 2009 – Process Virtualization Valgrind 23 ACACES 2009 – Process Virtualization Pin 24 ACACES 2009 – Process Virtualization Intel Pin Dynamic Instrumentation: • Do not need source code, recompilation, post-linking Programmable Instrumentation: • Provides rich APIs to write in C/C++ your own instrumentation tools (called Pintools) Multiplatform: • Supports x86, x86-64, Itanium, Xscale • Supports Linux, Windows, MacOS Robust: • Instruments real-life applications: Database, web browsers, … • Instruments multithreaded applications • Supports signals Efficient: • Applies compiler optimizations on instrumentation code 25 ACACES 2009 – Process Virtualization Using Pin Launch and instrument an application $ pin –t pintool.so –- application Instrumentation engine Instrumentation tool (provided in the kit) (write your own, or use one provided in the kit) Attach to and instrument an application $ pin –t pintool.so –pid 1234 26 ACACES 2009 – Process Virtualization Pin Instrumentation APIs Basic APIs are architecture independent: • Provide common functionalities like determining: – Control-flow changes – Memory accesses Architecture-specific APIs • e.g., Info about opcodes and operands Call-based APIs: • Instrumentation routines • Analysis routines 27 ACACES 2009 – Process Virtualization Instrumentation vs. Analysis Concepts borrowed from the ATOM tool: Instrumentation routines define where instrumentation is inserted • e.g., before instruction C Occurs first time an instruction is executed Analysis routines define what to do when instrumentation is activated • e.g., increment counter C Occurs every time an instruction is executed 28 ACACES 2009 – Process Virtualization Pintool 1: Instruction Count counter++; sub $0xff, %edx counter++; cmp %esi, %edx counter++; jle <L1> counter++; mov $0x1, %edi counter++; add $0x10, %eax 29 ACACES 2009 – Process Virtualization Pintool 1: Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out $ pin -t inscount0.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out Count 422838 30 ACACES 2009 – Process Virtualization #include <iostream> #include "pin.h" ManualExamples/inscount0.cpp UINT64 icount = 0; void docount() { icount++; } analysis routine void Instruction(INS ins, void *v) instrumentation routine { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } void Fini(INT32 code, void *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } 31 ACACES 2009 – Process Virtualization Pintool 2: Instruction Trace Print(ip); sub $0xff, %edx Print(ip); cmp %esi, %edx Print(ip); jle <L1> Print(ip); mov $0x1, %edi Print(ip); add $0x10, %eax Need to pass ip argument to the analysis routine (Printip()) 32 ACACES 2009 – Process Virtualization Pintool 2: Instruction Trace Output $ pin -t itrace.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out $ head -4 itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5 33 ACACES 2009 – Process Virtualization ManualExamples/itrace.cpp #include <stdio.h> #include "pin.h" argument to analysis routine FILE * trace; void printip(void *ip) { fprintf(trace, "%p\n", ip); } analysis routine instrumentation routine void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END); } void Fini(INT32 code, void *v) { fclose(trace); } int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } 34 ACACES 2009 – Process Virtualization Examples of Arguments to Analysis Routine IARG_INST_PTR – Instruction pointer (program counter) value IARG_UINT32 <value> – An integer value IARG_REG_VALUE <register name> – Value of the register specified IARG_BRANCH_TARGET_ADDR – Target address of the branch instrumented IARG_MEMORY_READ_EA – Effective address of a memory read And many more … (refer to the manual for details) 35 ACACES 2009 – Process Virtualization Instrumentation Points Instrument points relative to an instruction: • Before: IPOINT_BEFORE • After: – Fall-through edge: IPOINT_AFTER – Taken edge: IPOINT_TAKEN_BRANCH count() count() 36 cmp %esi, %edx jle <L1> mov $0x1, %edi count() <L1>: ACACES 2009 – Process Virtualization mov $0x8,%edi Instrumentation Granularity Instrumentation can be done at three different granularities: • Instruction • Basic block sub $0xff, %edx – A sequence of instructions terminated at a control-flow cmp %esi, %edx changing instruction jle <L1> – Single entry, single exit • Trace mov $0x1, %edi – A sequence of basic blocks add $0x10, %eax terminated at an jmp <L2> unconditional control-flow 1 Trace, 2 BBs, 6 insts changing instruction – Single entry, multiple exits 37 ACACES 2009 – Process Virtualization Pintool 3: Faster Instruction Count counter += 3 sub $0xff, %edx cmp %esi, %edx jle <L1> counter += 2 mov $0x1, %edi add 38 $0x10, %eax ACACES 2009 – Process Virtualization basic blocks (bbl) ManualExamples/inscount1.cpp #include <stdio.h> #include "pin.H“ UINT64 icount = 0; analysis routine void docount(INT32 c) { icount += c; } void Trace(TRACE trace, void *v) { instrumentation routine for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); } } void Fini(INT32 code, void *v) { fprintf(stderr, "Count %lld\n", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } 39 ACACES 2009 – Process Virtualization What Did We Learn Today? • Overview of Process Virtualization • Approaches • Source vs. Binary • Static vs. Dynamic • JIT vs. Probes • Three Available Systems • Three Simple Examples 40 ACACES 2009 – Process Virtualization Want More Info? • Read Jim Smith’s book: Virtual Machines • Download one (or more) of them! Pin www.pintool.org DynamoRIO code.google.com/p/dynamorio Valgrind www.valgrind.org Day Day Day Day 41 1 2 3 4 – – – – What is Process Virtualization? Building Process Virtualization Systems Using Process Virtualization Systems Symbiotic Optimization ACACES 2009 – Process Virtualization