x86-relocation.ppt

advertisement
Efficient x86 Instrumentation:
Dynamic Rewriting and Function Relocation
Itai Gurari
gurari@cs.wisc.edu
Computer Science Department
University of Wisconsin
1210 W. Dayton St.
Madison, WI 53706-1685
Paradyn/Condor Week
Madison, WI
March 12-14, 2001
Introduction
Dynamic Instrumentation:
• Insert instrumentation into application in
execution
• Used by Paradyn to gather performance data
• Paradyn instrumentation is inserted for
three types of points
– function entry, exit, and call
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 2
Paradyn
Instrumentation Points
Executable Code
foo ()
{
call <bar>
}
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 3
Paradyn
Instrumentation Points
Entry
Executable Code
foo ()
{
Call
call <bar>
Exit
}
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 4
Paradyn
Instrumentation Points
Entry
Executable Code
foo ()
{
Call
call <bar>
Exit
}
Instrumentation
startTimer()
counter++
stopTimer()
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 5
Goal
Transfer from function to instrumentation
code as quickly as possible
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 6
Control Transfer
To switch execution from a function to its
instrumentation code:
– Overwrite instructions in function with a control
transfer instruction.
– Equivalent of overwritten instructions are copied
to the code patch area.
– On the x86, Paradyn uses, by default, a 5- byte
jump to transfer control the instrumentation
code.
• 5-byte jump range is whole address space
– If a 5-byte instruction won’t fit, we use a 1-byte
traps (int3 instruction).
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 7
Inserting Control Transfer
Instructions
• Dynamically rewrite function in place
• Different techniques for different types
of instrumentation points
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 8
Jumps and Traps
Instrument Entry Point
Case 1
push mov
sub
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 9
Jumps and Traps
Instrument Entry Point
Case 1
push mov
sub
Enough room to replace
instruction with a jump
jmp <instrumentation>
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 10
Jumps and Traps
Instrument Entry Point
Case 2
push mov
jmp
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 11
Jumps and Traps
Instrument Entry Point
Case 2
push mov
jmp
Inserting a jump instruction interferes with
the target of the backwards jump
jmp <instrumentation>
jmp
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 12
Jumps and Traps
Instrument Entry Point
Case 2
push mov
jmp
Must use a trap instruction
to get to instrumentation
int3 mov
jmp
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 13
Jumps and Traps
Instrument Call Point
call <Foo>
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 14
Jumps and Traps
Instrument Call Point
call <Foo>
jmp <instrumentation>
Enough room
to replace instruction
with a jump
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 15
Jumps and Traps
Instrument Exit Point
Case 1
mov
leave ret
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 16
Jumps and Traps
Instrument Exit Point
Case 1
mov
leave ret
jmp <instrumentation>
Back up far enough to replace
instructions with a jump
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 17
Jumps and Traps
Instrument Exit Point
Case 2
call <Foo>
leave ret
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 18
Jumps and Traps
Instrument Exit Point
Case 2
call <Foo>
call
leave ret
jmp <instrumentation>
Jump interferes with
the preceding call
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 19
Jumps and Traps
Instrument Exit Point
Case 2a
call <Foo>
leave ret
Beginning of next
function
(4-byte boundary)
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 20
Jumps and Traps
Instrument Exit Point
Case 2a
Compiler pads
with “bonus bytes”
call <Foo>
leave ret
?
?
?
Beginning of next
function
(4-byte boundary)
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 21
Jumps and Traps
Instrument Exit Point
Case 2a
Compiler pads
with “bonus bytes”
call <Foo>
leave ret
Replace instructions
with a jump
call <Foo>
?
?
?
Beginning of next
function
(4-byte boundary)
jmp <instrumentation>
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 22
Jumps and Traps
Instrument Exit Point
Case 2b
call <Foo>
leave ret
?
Not enough
“bonus bytes”
to overwrite
with a jump
(if any)
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 23
Jumps and Traps
Instrument Exit Point
Case 2b
call <Foo>
call <Foo>
leave ret
leave int3
?
?
Not enough
“bonus bytes”
to overwrite
with a jump
(if any)
Overwrite
return with
a trap
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 24
Jumps and Traps
Extra slot
No jumps to first ten bytes of function
push mov
sub
mov
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 25
Jumps and Traps
Extra slot
No jumps to first ten bytes of function
push mov
sub
mov
jmp <instrumentation>
mov
Enough space to
overwrite entry
with a jump
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 26
Jumps and Traps
Extra slot
No jumps to first ten bytes of function
push mov
sub
Enough space to
overwrite entry
with a jump
jmp <instrumentation>
mov
Make 2-byte jump to “extra
slot”, overwrite “extra slot”
with jump to instrumentation
jmp <instrumentation>
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 27
Control Transfer
Traps on x86
• Generate an exception that is caught by either
the application (Solaris, Linux) or the paradyn
daemon (Windows NT).
• Address of trap instruction is used to
calculate which instrumentation code to execute.
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 28
Problem
Trap handling is slow:
• On Solaris 2.6 jumps are over 1000 times faster
than traps.
• On Linux 2.2 jumps are over 200 times faster
than traps
Traps Limit Instrumentation:
• can’t insert as much or at as fine a granularity
Trap handling logic is difficult:
• Susceptible to bugs
• Difficult to understand and maintain
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 29
Solution
Rewrite functions that do not have enough
room for jumps, into functions that do
have enough room for jumps.
– Rewrite the function, on-the-fly: combines
dynamic instrumentation, binary rewriting.
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 30
Dynamic Rewriting
Dynamic
Rewriting
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 31
Dynamic Rewriting
Dynamic
Rewriting
overwrite
existing
instructions
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 32
Dynamic Rewriting
Dynamic
Rewriting
overwrite
existing
instructions
expand
instrumentation
points
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 33
Dynamic Rewriting
Dynamic
Rewriting
overwrite
existing
instructions
expand
instrumentation
points
Relocate Function
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 34
Function Rewriting and Relocation
In Paradyn we rewrite a function:
– only if the function contains an
instrumentation point that would require using
a trap to instrument
– the first time a request to instrument the
function is made
– even if the instrumentation to be inserted is
not for a point that requires using a jump
• e.g. the exit needs a trap, the entry can use
a jump, request is to instrument the entry
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 35
Function Rewriting and Relocation
(continued)
– all instrumentation points that cannot use a
jump are expanded.
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 36
Rewriting A Function
Entry
Call
push mov
call <Foo>
call <Bar>
ret
Call
Exit
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 37
Rewriting A Function
Entry
Call
Insert nop at entry
push nop mov
call <Foo>
call <Bar>
ret
Call
Exit
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 38
Rewriting A Function
Entry
Call
Insert nop at entry
jmp < instrumentation > call <Foo>
call <Bar>
ret
Call
Exit
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 39
Rewriting A Function
Entry
Call
Insert nop at entry
jmp < instrumentation > call <Foo>
call <Bar>
ret nop nop nop nop
Insert nops at exit
Call
Exit
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 40
Rewriting A Function
Entry
Call
Insert nop at entry
jmp < instrumentation > call <Foo>
call <Bar>
jmp < instrumentation >
Insert nops at exit
Call
Exit
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 41
Rewriting A Function
Original Function
Call
Entry
push mov
call <Foo>
call <Bar>
ret
Call
Exit
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 42
Rewriting A Function
Original Function
Entry
Overwrite entry of original
function with jump to
rewritten function
jmp < rewritten function>
call <Foo>
call <Foo>
ret
Call
Exit
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 43
Update Jumps and Calls
• PC-relative jump and call instructions:
– with destinations outside the function will
have incorrect displacements
– some jumps to locations inside the function
will have incorrect displacements
• 2-byte jumps:
– have range of 128 bytes forward, 127 bytes
backwards
– if target address is no longer in range,
replace 2-byte instruction with 5-byte
instruction that has further reach
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 44
Status
Dynamic rewriting and function relocation is
operational in Paradyn release 3.2 for x86
(Solaris, Linux, Windows NT).
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 45
Current Limitations
We do not relocate a function if:
– the application is executing within the
function we want to instrument
– it has a jump table
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 46
Jumps vs. Traps
Trap handling:
Average time to get to instrumentation and back
Trap
Jump
Solaris
37.6
.03
Linux
8.3
.04
• time in microseconds
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 47
Jumps vs. Traps
• Relocating functions that are performance
bottlenecks, leads to greatest speedup
• More instrumentation can be inserted
since perturbation to system is minimized.
• In Paradyn, ratio of speedup depends on
type of metric (e.g. CPU time, number of
procedure calls)
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 48
Some Results
bubba (circuit layout)
• instrumented 9 functions for CPU
– all required trap for exit point
– 5 relocated functions
• called 400 thousand times
• consumed 20% of CPU.
• 23 seconds to execute using relocation
• 42 seconds to execute without relocation
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 49
Some Results
fspx (2-D heat transfer simulation)
• 4 of 46 functions required traps
– all for exit points
• instrumented __atan for CPU
– required trap for exit
– called 107 million times
– consumed 25% of CPU.
• 7.5 minutes to execute using relocation
• 115 minutes to execute without relocation
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 50
Conclusions
Dynamic rewriting and function relocation:
• Used by Paradyn to allow using jumps,
instead of traps, when profiling
applications, to improve performance.
• Crucial for large scale and fine-grained
instrumentation.
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation
Page 51
Download