Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Itai Gurari gurari@cs.wisc.edu Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685 Paradyn/Condor Week Madison, WI March 12-14, 2001 Introduction Dynamic Instrumentation: • Insert instrumentation into application in execution • Used by Paradyn to gather performance data • Paradyn instrumentation is inserted for three types of points – function entry, exit, and call Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 2 Paradyn Instrumentation Points Executable Code foo () { call <bar> } Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 3 Paradyn Instrumentation Points Entry Executable Code foo () { Call call <bar> Exit } Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 4 Paradyn Instrumentation Points Entry Executable Code foo () { Call call <bar> Exit } Instrumentation startTimer() counter++ stopTimer() Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 5 Goal Transfer from function to instrumentation code as quickly as possible Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 6 Control Transfer To switch execution from a function to its instrumentation code: – Overwrite instructions in function with a control transfer instruction. – Equivalent of overwritten instructions are copied to the code patch area. – On the x86, Paradyn uses, by default, a 5- byte jump to transfer control the instrumentation code. • 5-byte jump range is whole address space – If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 7 Inserting Control Transfer Instructions • Dynamically rewrite function in place • Different techniques for different types of instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 8 Jumps and Traps Instrument Entry Point Case 1 push mov sub Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 9 Jumps and Traps Instrument Entry Point Case 1 push mov sub Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 10 Jumps and Traps Instrument Entry Point Case 2 push mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 11 Jumps and Traps Instrument Entry Point Case 2 push mov jmp Inserting a jump instruction interferes with the target of the backwards jump jmp <instrumentation> jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 12 Jumps and Traps Instrument Entry Point Case 2 push mov jmp Must use a trap instruction to get to instrumentation int3 mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 13 Jumps and Traps Instrument Call Point call <Foo> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 14 Jumps and Traps Instrument Call Point call <Foo> jmp <instrumentation> Enough room to replace instruction with a jump Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 15 Jumps and Traps Instrument Exit Point Case 1 mov leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 16 Jumps and Traps Instrument Exit Point Case 1 mov leave ret jmp <instrumentation> Back up far enough to replace instructions with a jump Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 17 Jumps and Traps Instrument Exit Point Case 2 call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 18 Jumps and Traps Instrument Exit Point Case 2 call <Foo> call leave ret jmp <instrumentation> Jump interferes with the preceding call Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 19 Jumps and Traps Instrument Exit Point Case 2a call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 20 Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” call <Foo> leave ret ? ? ? Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 21 Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” call <Foo> leave ret Replace instructions with a jump call <Foo> ? ? ? Beginning of next function (4-byte boundary) jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 22 Jumps and Traps Instrument Exit Point Case 2b call <Foo> leave ret ? Not enough “bonus bytes” to overwrite with a jump (if any) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 23 Jumps and Traps Instrument Exit Point Case 2b call <Foo> call <Foo> leave ret leave int3 ? ? Not enough “bonus bytes” to overwrite with a jump (if any) Overwrite return with a trap Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 24 Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 25 Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov jmp <instrumentation> mov Enough space to overwrite entry with a jump Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 26 Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub Enough space to overwrite entry with a jump jmp <instrumentation> mov Make 2-byte jump to “extra slot”, overwrite “extra slot” with jump to instrumentation jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 27 Control Transfer Traps on x86 • Generate an exception that is caught by either the application (Solaris, Linux) or the paradyn daemon (Windows NT). • Address of trap instruction is used to calculate which instrumentation code to execute. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 28 Problem Trap handling is slow: • On Solaris 2.6 jumps are over 1000 times faster than traps. • On Linux 2.2 jumps are over 200 times faster than traps Traps Limit Instrumentation: • can’t insert as much or at as fine a granularity Trap handling logic is difficult: • Susceptible to bugs • Difficult to understand and maintain Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 29 Solution Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps. – Rewrite the function, on-the-fly: combines dynamic instrumentation, binary rewriting. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 30 Dynamic Rewriting Dynamic Rewriting Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 31 Dynamic Rewriting Dynamic Rewriting overwrite existing instructions Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 32 Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 33 Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Relocate Function Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 34 Function Rewriting and Relocation In Paradyn we rewrite a function: – only if the function contains an instrumentation point that would require using a trap to instrument – the first time a request to instrument the function is made – even if the instrumentation to be inserted is not for a point that requires using a jump • e.g. the exit needs a trap, the entry can use a jump, request is to instrument the entry Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 35 Function Rewriting and Relocation (continued) – all instrumentation points that cannot use a jump are expanded. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 36 Rewriting A Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 37 Rewriting A Function Entry Call Insert nop at entry push nop mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 38 Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 39 Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret nop nop nop nop Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 40 Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> jmp < instrumentation > Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 41 Rewriting A Function Original Function Call Entry push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 42 Rewriting A Function Original Function Entry Overwrite entry of original function with jump to rewritten function jmp < rewritten function> call <Foo> call <Foo> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 43 Update Jumps and Calls • PC-relative jump and call instructions: – with destinations outside the function will have incorrect displacements – some jumps to locations inside the function will have incorrect displacements • 2-byte jumps: – have range of 128 bytes forward, 127 bytes backwards – if target address is no longer in range, replace 2-byte instruction with 5-byte instruction that has further reach Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 44 Status Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 45 Current Limitations We do not relocate a function if: – the application is executing within the function we want to instrument – it has a jump table Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 46 Jumps vs. Traps Trap handling: Average time to get to instrumentation and back Trap Jump Solaris 37.6 .03 Linux 8.3 .04 • time in microseconds Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 47 Jumps vs. Traps • Relocating functions that are performance bottlenecks, leads to greatest speedup • More instrumentation can be inserted since perturbation to system is minimized. • In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 48 Some Results bubba (circuit layout) • instrumented 9 functions for CPU – all required trap for exit point – 5 relocated functions • called 400 thousand times • consumed 20% of CPU. • 23 seconds to execute using relocation • 42 seconds to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 49 Some Results fspx (2-D heat transfer simulation) • 4 of 46 functions required traps – all for exit points • instrumented __atan for CPU – required trap for exit – called 107 million times – consumed 25% of CPU. • 7.5 minutes to execute using relocation • 115 minutes to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 50 Conclusions Dynamic rewriting and function relocation: • Used by Paradyn to allow using jumps, instead of traps, when profiling applications, to improve performance. • Crucial for large scale and fine-grained instrumentation. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 51