Pin: Intel’s Dynamic Binary
Instrumentation Engine
Pin Tutorial
Intel Corporation
Presented By:
Robert Cohn
Tevi Devor
CGO 2010
1
Software & Services Group
Agenda
• Part1: Introduction to Pin
• Part2: Larger Pin tools and writing efficient Pin
tools
• Part3: Deeper into Pin API
• Part4: Advanced Pin API
• Part5: Performance #s
Software & Services Group
2
Part1: Introduction to Pin
• Dynamic Binary Instrumentation
• Pin Capabilities
• Overview of how Pin works
• Sample Pin Tools
Software & Services Group
3
What Does “Pin” Stand For?
• Three Letter Acronyms @ Intel
– TLAs
– 263 possible TLAs
– 263 -1 are in use at Intel
– Only 1 is not approved for use at Intel
– Guess which one:
• Pin Is Not an acronym
• Pin is based on the post link optimizer Spike
– Use dynamic code generation to make a less intrusive
profile guided optimization and instrumentation system
– Pin is a small Spike
– Spike is EOL
http://www.cgo.org/cgo2004/papers/01_82_luk_ck.pdf
Software & Services Group
4
Instrumentation
A technique that inserts code into a program to collect
run-time information
 Program analysis : performance profiling, error detection,
capture & replay
 Architectural study : processor and cache simulation, trace
collection
• Source-Code Instrumentation
• Static Binary Instrumentation
• Dynamic Binary Instrumentation
 Instrument code just before it runs (Just In Time – JIT)
– No need to recompile or re-link
– Discover code at runtime
– Handle dynamically-generated code
Pin istoarunning
dynamic
binary
– Attach
processes
instrumentation engine
Software & Services Group
5
Advantages of Pin Instrumentation
• Programmable Instrumentation:
– Provides rich set of APIs to write, in C,C++,assembly, your own
instrumentation tools, called PinTools
– APIs are designed to maximize ease of use
– abstract away the underlying instruction set idiosyncrasies
• Multiplatform:
– Supports IA-32, Intel64, IA-64
– Supports Linux, Windows, MacOS
• Robust:
–
–
–
–
Can instrument real-life applications: Database, web browsers, …
Can instrument multithreaded applications
Supports signals and exceptions, self modifying code…
If you can Run it – you can Pin it
• Efficient:
Pin can be used to instrument all the user level code
– Applies compiler optimizations
on instrumentation code
in an application
Software & Services Group
6
Pin Instrumentation Capabilities
• Use Pin APIs to write PinTools that:
– Replace application functions with your own
– Call the original application function from within your replacement
function
– Fully examine any application instruction, and insert a call to your
instrumenting function to be executed whenever that instruction
executes
– Pass parameters to your instrumenting function from a large set of
supported parameters
–
–
–
–
Register values (including IP), Register values by reference (for modification)
Memory addresses read/written by the instruction
Full register context
….
– Track function calls including syscalls and examine/change
arguments
– Track application threads
– Intercept signals
If Pina doesn’t
have it, you don’t want it
– Instrument
process tree
– Many other capabilities…
Software & Services Group
7
Usage of Pin at Intel
• Profiling and analysis products
– Intel Parallel Studio
– Amplifier (Performance Analysis)
– Lock and waits analysis
– Concurrency analysis
– Inspector (Correctness Analysis)
– Threading error detection (data race and deadlock)
– Memory error detection
GUI
Algorithm
PinTool
Pin
• Architectural research and enabling
– Emulating new instructions (Intel SDE)
– Trace generation
– Branch prediction and cache modeling
Software & Services Group
8
Example Pin-tools
Cache
Simulation
CMP$IM
Instruction
Emulation
MT Workload
Capture
& Deterministic
Replay
PinPlay
(new instructions)
SDE
Pin
Simulation
Trace
Region
Generation
Selection
SDE: http://software.intel.com/en-us/articles/intel-software-development-emulator
pinLIT
PinPoints
CMP$IM: http://www-mount.ece.umn.edu/~jjyi/MoBS/2008/program/02A-Jaleel.pdf
PinPlay: Paper presented at CGO2010 http://www.cgo.org/cgo2010/program.html
Software & Services Group
9
Pin Usage Outside Intel
• Popular and well supported
– 30,000+ downloads, 400+ citations
• Free DownLoad
– www.pintool.org
– Includes: Detailed user manual, source code for 100s of
Pin tools
• Pin User Group (PinHeads)
– http://tech.groups.yahoo.com/group/pinheads/
– Pin users and Pin developers answer questions
Software & Services Group
10
Launcher Process
PIN.EXE
Count
258743109
pin.exe
–tInvocation
inscount.dll
– gzip.exe input.txt
Pin
gzip.exe input.txt
Read a at
Trace
Application
Starting
firstfrom
application
IP Read
Code
PinTool
a Trace from Application
Codethat counts application
Source
Trace
exit branch
is
Start PINVM.DLL
Execution
of Trace
ends
instructions
executed, prints Count
Jit
adding
instrumentation
code
modified
to instrumentation
directly
branch
to
running
Jit
it,it,
adding
code
at Jit
endnext
LoadTrace to
from
inscount.dll
Call
into
PINVM.DLL
Destination
from
inscount.dll
Load
PINVM.DLL
inscount.dll
(firstAppIp,
trace
Encode
the
jitted
trace
the
and
run into
its
“inscount.dll”)
Encode
the
trace
theinto
Code
Code
Cache
Pass
in
app
IP
of
Trace’s
target
main()
Cache
Launcher
Execute Jitted code
WriteProcessMemory(BootRoutine,
BootData)
Resume
atand
BootRoutine
GetContext(&firstAppIp)
Inject
Pin BootRoutine
Data intosuspended)
application
SetContext(BootRoutineIp)
CreateProcess
(gzip.exe,
input.txt,
Boot Routine +
Data:
firstAppIp,
“Inscount.dll”
First
app
IP
Application Process
PIN.LIB
System Call
Dispatcher
Event
Dispatcher
Encoder
PINVM.DLL
Decoder
Application
Code and
Data
inscount.dll
Code
Cache
Thread
Dispatcher
NTDLL.DLL
app Ip of
Trace’s
target
Windows kernel
Software & Services Group
11
All code in this presentation is covered
by the following:
•
•
/*BEGIN_LEGAL
Intel Open Source License
•
•
•
•
•
Copyright (c) 2002-2010 Intel Corporation. All rights reserved.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer. Redistributions
in binary form must reproduce the above copyright notice, this list of
conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution. Neither the name of
the Intel Corporation nor the names of its contributors may be used to
endorse or promote products derived from this software without
specific prior written permission.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE INTEL OR
ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
END_LEGAL */
Software & Services Group
12
Instruction Counting Tool (inscount.dll)
#include "pin.h"
UINT64 icount = 0;
Execution time routine
void docount() { icount++; }
void Instruction(INS ins, void *v)
Jitting
{
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)docount, IARG_END);
}
time routine: Pin CallBack
void Fini(INT32 code, void *v)
{ std::cerr << "Count " << icount << endl; }
•
int main(int argc, char * argv[])
{
PIN_Init(argc, argv);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram(); // Never returns
return 0;
}
•
•
•
switch to pin stack
save registers
call docount
restore registers
inc icount
switch
to app stack
sub
$0xff, %edx
inc icount
cmp
%esi, %edx
save eflags
inc icount
restore eflags
jle
<L1>
inc icount
mov
0x1, %edi
Software & Services Group
13
Launcher Process
pin.exe –t inscount.dll – gzip.exe input.txt
PIN.EXE
Read a Trace from Application Code
Jit it, adding instrumentation code
from inscount.dll
Launcher
Encode the Jitted trace into the
Code Cache
Boot Routine +
Data:
firstAppIp,
“Inscount.dll”
First
app
IP
Application Process
PIN.LIB
System Call
Dispatcher
Event
Dispatcher
Encoder
PINVM.DLL
Decoder
Application
Code and
Data
inscount.dll
Code
Cache
Thread
Dispatcher
NTDLL.DLL
Windows kernel
Software & Services Group
14
Trace
Trace
BBL#1
TK
FT
’
BBL#1
BBL#2
BBL#4
’
BBL#2
TK
BBL#3
Original
code
BBL# 5
Early Exit via Stub
BBL# 6
Early Exit via Stub
Trace Exit via Stub
FT
BBL# 7
BBL#3
• Trace: A sequence of continuous instructions, with
one entry point
• BBL: has one entry point and ends at first control
transfer instruction
Software & Services Group
15
ManualExamples/inscount2.cpp
#include "pin.H"
UINT64 icount = 0;
void PIN_FAST_ANALYSIS_CALL docount(INT32 c) { icount += c; }
void Trace(TRACE trace, void *v){{
for(BBL bbl = TRACE_BblHead(trace);
BBL_Valid(bbl);
bbl = BBL_Next(bbl))
BBL_InsertCall(bbl, IPOINT_ANYWHERE,
(AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL,
IARG_UINT32, BBL_NumIns(bbl),
IARG_END);
}
void Fini(INT32 code, void *v) {// Pin Callback
fprintf(stderr, "Count %lld\n", icount);
}
int main(int argc, char * argv[]) {
PIN_Init(argc, argv);
TRACE_AddInstrumentFunction(Trace, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
Software & Services Group
16
2
22
40
37
17
APP IP
0x77ec4600
0x77ec4603
0x77ec4609
0x77ec460d
cmp
jz
movzx
call
rax, rdx
0x77f1eac9
ecx, [rax+0x2]
0x77ef7870
20 0x001de0000 mov r14, 0xc5267d40
//inscount2.docount
58 0x001de000a add [r14], 0x2
//inscount2.docount
2 0x001de0015 0x77ec4600 cmp rax, rdx
9 0x001de0018 jz 0x1deffa0 (L1) //patched in future
52 0x001de001e mov r14, 0xc5267d40
//inscount2.docount
29 0x001de0028 mov [r15+0x60], rax
57 0x001de002c lahf
save
status
37 0x001de002e seto al
flags
50 0x001de0031 mov [r15+0xd8], ax
30 0x001de0039 mov rax, [r15+0x60]
12 0x001de003d add [r14], 0x2
//inscount2.docount
40 0x001de0048 0x77ec4609 movzx edi, [rax+0x2] //ecx alloced to edi
22 0x001de004c push 0x77ec4612
//push retaddr
61 0x001de0051 nop
17 0x001de0052 jmp 0x1deffd0 (L2)//patched in future
(L1)
41 0x001deffa0 mov [r15+0x40], rsp // save app rsp
63 0x001deffa4 mov rsp, [r15+0x2d0] // switch to pin stack
56 0x001deffab call [0x2f000000]
// call VmEnter
// data used by VmEnter – pointed to by return-address of call
0x001deffb8_svc(VMSVC_XFER)
0x001deffc0_sct(0x00065f998) // current register
// mapping
0x001deffc8_iaddr(0x077f1eac9)
// app target IP of jz
(L2)
// at 0x77ec4603
24 0x001deffd0 mov [r15+0x40], rsp // save app rsp
34 0x001deffd4 mov rsp, [r15+0x2d0] // switch to pin stack
66 0x001deffdb call [0x2f000000]// call VmEnter
// data used by VmEnter – pointed to by return-address of call
0x001deffe8_svc(VMSVC_XFER)
0x001defff0_sct(0x00065fb60)
// current register mapping
0x001defff8_iaddr(0x077ef7870)
// app target IP of
Software & Services Group
// call at 0x77ec460d
SimpleExamples/inscount2_mt.cpp
#include "pin.H"
INT32 numThreads = 0;
const INT32 MaxNumThreads = 10000;
struct THREAD_DATA
{
UINT64 _count;
UINT8 _pad[56]; // guess why? }icount[MaxNumThreads];
// Analysis routine
VOID PIN_FAST_ANALYSIS_CALL docount(ADDRINT c, THREADID tid) { icount[tid]._count += c;}
// Pin Callback
VOID ThreadStart(THREADID threadid, CONTEXT *ctxt, INT32 flags, VOID *v){numThreads++;}
VOID Trace(TRACE trace, VOID *v) { // Jitting time routine: Pin Callback
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
BBL_InsertCall(bbl, IPOINT_ANYWHERE, (AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL,
IARG_UINT32, BBL_NumIns(bbl), IARG_THREAD_ID, IARG_END);
}
VOID Fini(INT32 code, VOID *v){// Pin Callback
for (INT32 t=0; t<numThreads; t++)
printf ("Count[of thread#%d]= %d\n",t,icount[t]._count);
}
int main(int argc, char * argv[])
{
PIN_Init(argc, argv);
for (INT32 t=0; t<MaxNumThreads; t++) {icount[t]._count = 0;}
PIN_AddThreadStartFunction(ThreadStart, 0);
TRACE_AddInstrumentFunction(Trace, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram(); return 0;
}
Software & Services Group
18
Multi-Threading
• Pin supports multi-threading
– Application threads execute jitted code including
instrumentation code (inlined and not inlined), without any
serialization introduced by Pin
– Instrumentation code can use Pin and/or OS synchronization
constructs to introduce serialization if needed.
– Will see examples of this in Part4
– Pin provides APIs for thread local storage.
– Will see examples in Part3
– Pin callbacks are serialized
– Jitting is serialized
– Only one application thread can be jitting code at any time
Software & Services Group
19
Memory Read Logger Tool
#include "pin.h“#include <map>
std::map<ADDRINT, std::string> disAssemblyMap;
VOID ReadsMem (ADDRINT applicationIp, ADDRINT memoryAddressRead, UINT32 memoryReadSize) {
printf ("0x%x %s reads %d bytes of memory at 0x%x\n",
applicationIp, disAssemblyMap[applicationIp].c_str(),
memoryReadSize, memoryAddressRead);}
VOID Instruction(INS ins, void * v) {// Jitting time routine
// Pin Callback
if (INS_IsMemoryRead(ins))
{
disAssemblyMap[INS_Address(ins)] = INS_Disassemble(ins);
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) ReadsMem,
IARG_INST_PTR,// application IP
IARG_MEMORYREAD_EA,
IARG_MEMORYREAD_SIZE,
IARG_END);
}
}
Pin has
determined that it
can overwrite ecx
int main(int argc, char * argv[]) {
PIN_Init(argc, argv);
INS_AddInstrumentFunction(Instruction, 0);
Switch to pin stack
push 4
push %eax
push 0x7f083de
call ReadsMem
Pop args off pin stack
Switch back to app stack
• inc DWORD_PTR[%eax]
Switch to pin stack
push 4
lea %ecx,[%esi]0x8
push %ecx
push 0x7f083e4
call ReadsMem
Pop args off pin stack
Switch back to app stack
• inc DWORD_PTR[%esi]0x8
PIN_StartProgram(); }
Software & Services Group
20
Malloc Replacement
#include "pin.H"
void * MallocWrapper( CONTEXT * ctxt, AFUNPTR pf_malloc, size_t size)
{ // Simulate out-of-memory every so often
void * res;
if (TimeForOutOfMem())
return (NULL);
PIN_CallApplicationFunction(ctxt, PIN_ThreadId(),
CALLINGSTD_DEFAULT, pf_malloc,
PIN_PARG(void *), &res, PIN_PARG(size_t), size);
return res; }
VOID ImageLoad(IMG img, VOID *v) { // Pin callback. Registered by IMG_AddInstrumentFunction
if (strstr(IMG_Name(img).c_str(), "libc.so") ||
strstr(IMG_Name(img).c_str(), "MSVCR80") || strstr(IMG_Name(img).c_str(), "MSVCR90"))
{
RTN mallocRtn = RTN_FindByName(img, "malloc");
PROTO protoMalloc = PROTO_Allocate( PIN_PARG(void *), CALLINGSTD_DEFAULT,
"malloc", PIN_PARG(size_t), PIN_PARG_END() );
}}
RTN_ReplaceSignature(mallocRtn, AFUNPTR(MallocWrapper),
IARG_PROTOTYPE, protoMalloc,
IARG_CONTEXT,
IARG_ORIG_FUNCPTR,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
int main(int argc, CHAR *argv[]) {
PIN_InitSymbols();
PIN_Init(argc,argv));
IMG_AddInstrumentFunction(ImageLoad, 0);
PIN_StartProgram();
}
Software & Services Group
21
Pin Probe-Mode
• Probe mode is a method of using Pin to wrap or replace
application functions with functions in the tool. A jump
instruction (probe), which redirects the flow of control to the
replacement function is placed at the start of the specified
function.
• The bytes being overwritten are relocated, so that Pin can
provide the replacement function with the address of the first
relocated byte. This enables the replacement function to call
the replaced (original) function.
• In probe mode, the application and the replacement routine
are run natively (not Jitted). This improves performance, but
puts more responsibility on the tool writer. Probes can only be
placed on RTN boundaries, and should inserted within the
Image load callback. Pin will automatically remove the probes
when an image is unloaded.
• Many of the PIN APIs that are available in JIT mode are not
available in Probe mode.
Software & Services Group
22
Malloc Replacement Probe-Mode
#include "pin.H"
void * MallocWrapper(AFUNPTR pf_malloc, size_t size)
{ // Simulate out-of-memory every so often
void * res;
if (TimeForOutOfMem())
return (NULL);
res = pf_malloc(size);
return res; }
VOID ImageLoad (IMG img, VOID *v) {
if (strstr(IMG_Name(img).c_str(), "libc.so") ||
strstr(IMG_Name(img).c_str(), "MSVCR80") || strstr(IMG_Name(img).c_str(), "MSVCR90"))
{
RTN mallocRtn = RTN_FindByName(img, "malloc");
if ( RTN_Valid(mallocRtn) && RTN_IsSafeForProbedReplacement(mallocRtn) )
{
PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc",
PIN_PARG(size_t), PIN_PARG_END() );
} }}
RTN_ReplaceSignatureProbed (mallocRtn,
AFUNPTR(MallocWrapper),
IARG_PROTOTYPE, proto_malloc,
IARG_ORIG_FUNCPTR,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
int main(int argc, CHAR *argv[]) {
PIN_InitSymbols();
PIN_Init(argc,argv));
IMG_AddInstrumentFunction(ImageLoad, 0);
PIN_StartProgramProbed();
}
Software & Services Group
23
SDE
•SDE: A fast functional simulator for
applications with new instructions
–New instructions have been defined
–Compiler generates code with new
instructions
–What can be used to run the apps with
the new instructions?
–Use PinTool that emulates new instructions.
– vmovdqu ymm?, mem256
vmovdqu mem256, ymm?
– 16 new 256 bit ymm registers
– Read/Write ymm register from/to memory.
Software & Services Group
24
Launcher Process
pin.exe –t inscount.dll – gzip.exe input.txt
PIN.EXE
Read a Trace from Application Code
Jit it, adding instrumentation code
from inscount.dll
Launcher
Encode the Jitted trace into the
Code Cache
Execute it
Boot Routine +
Data:
firstAppIp,
“Inscount.dll”
First
app
IP
Application Process
PIN.LIB
System Call
Dispatcher
Event
Dispatcher
Encoder
PINVM.DLL
Decoder
Application
Code and
Data
inscount.dll
Code
Cache
Thread
Dispatcher
NTDLL.DLL
Windows kernel
Software & Services Group
25
#include "pin.H"
sde_emul.dll Schema
VOID EmVmovdquMem2Reg(unsigned int ymmDstRegNum, ADDRINT * ymmMemSrcPtr) {
PIN_SafeCopy(ymmRegs[ymmDstRegNum], ymmMemSrcPtr, 32); }
VOID EmVmovdquReg2Mem(int ymmSrcRegNum, ADDRINT * ymmMemDstPtr) {
PIN_SafeCopy(ymmMemDstPtr, ymmRegs[ymmRegNum], 32); }
VOID Instruction(INS ins, VOID *v) {
switch (INS_Opcode(ins)
{
:::::
case XED_ICLASS_VMOVDQU:
if (INS_IsMemoryRead(ins))
// vmovdqu ymm? <= mem256
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)EmVmovdquMem2Reg,
IARG_UINT32, REG(INS_OperandReg(ins, 0)) - REG_YMM0,
IARG_MEMORYREAD_EA,
IARG_END);
else if (INS_IsMemoryWrite(ins)) // vmovdqu mem256 <= ymm?
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)EmVmovdquReg2Mem,
IARG_UINT32, REG(INS_OperandReg(ins, 1)) - REG_YMM0,
IARG_MEMORYWRITE_EA,
IARG_END);
INS_DeleteIns(ins); //Processor does NOT execute this instruction
break;
}}
int main(int argc, CHAR *argv[]) {
PIN_Init(argc,argv));
INS_AddInstrumentFunction(Instruction, 0);
PIN_StartProgram(); }
Software & Services Group
26
fork
gzip
(Injectee)
Pin (Injectee)
Pin stack
Code to Save
MiniLoader
Code to Save
Pin Code
gzip
Code and
and
Data
Data
exitLoop = FALSE;
Linux
Invocation+Injection
pin –t
inscount.so
– gzip input.txt
gzip
input.txt
Ptrace TraceMe
Child
(Injector)
PinTool that counts application
while(!exitLoop){}
instructions executed, prints Count
Ptrace Injectee – Injectee Freezes
at end
MiniLoader
Injectee.exitLoop = TRUE;
MiniLoader
Ptrace continue (unFreezes Injectee)
execv(gzip);
// Injectee Freezes
Execution of Injector
resumes after execv(gzip)
in Injectee completes
Pin Code and
Data
Ptrace Copy (save, gzip.CodeSegment, sizeof(MiniLoader))
PtraceGetContext (gzip.OrigContext)
PtraceCopy (gzip.CodeSegment, MiniLoader, sizeof(MiniLoader))
MiniLoader
IP
Pin Code and
Data
Ptrace continue@MiniLoader (unFreezes Injectee)
MiniLoader loads Pin+Tool,
allocates Pin stack
Kill(SigTrace, Injector):
Freezes until Ptrace Cont
Wait for MiniLoader
complete (SigTrace from
Injectee)
gzip OrigCtxt
gzip OrigCtxt
Ptrace Copy (gzip.CodeSegment, save, sizeof(MiniLoader))
Ptrace Copy (gzip.pin.stack, gzip.OrigCtxt, sizeof (ctxt))
Ptrace SetContext (gzip.IP=pin, gzip.SP=pin.Stack)
Code to Save
Inscount2.so
Ptrace Detach
Software & Services Group
27
Part1 Summary
• Pin is Intel’s dynamic binary instrumentation engine
• Pin can be used to instrument all user level code
–
–
–
–
–
–
Windows, Linux
IA-32, Intel64, IA64
Product level robustness
Jit-Mode for full instrumentation: Thread, Function, Trace, BBL, Instruction
Probe-Mode for Function Replacement/Wrapping/Instrumentation only.
Pin supports multi-threading, no serialization of jitted application nor of instrumentation code
• Pin API makes Pin Tools easy to write
– Presented 6 full Pin tools, each one fit on 1 ppt slide
• Popular and well supported
– 30,000+ downloads, 400+ citations
• Free DownLoad
– www.pintool.org
– Includes: Detailed user manual, source code for 100s of Pin tools
• Pin User Group
– http://tech.groups.yahoo.com/group/pinheads/
– Pin users and Pin developers answer questions
Software & Services Group
28
Part2: Larger Pin tools and writing
efficient Pin tools
Software & Services Group
29
CMP$im – A CMP Cache Simulation Pin Tool
WORK LOAD
Modeling an 8-core CMP using CMP$im
Instrumentation Routines
PIN
• ThreadID
• Address,
Size
• Access
Type
ThreadID, Address, Size, Access Type
DL1 DL1 DL1 DL1 DL1 DL1 DL1 DL1
L2
L2
L2
L2
L2
L2
L2
L2
LLC
LLC
INTERCONNECT
Cache model
• Params to
configure # cache
levels, size,
threads/cache etc
30
LLC
LLC
LLC
LLC
LLC
LLC
PRIVATE LLC/SHARED BANKED LLC
Software & Services Group
CMP$im author: Aamer.Jaleel@intel.com
ANALYSIS
ROUTINES
INSTR
ROUTINES
•VOID Instruction(INS ins, VOID *v)
•{
•
if( INS_IsMemoryRead(ins) ) // If instruction reads
•
// from memory
•
INS_InsertCall(ins,
•
IPOINT_BEFORE, (AFUNPTR)MemoryReference,
•
IARG_THREAD_ID, IARG_MEMORYREAD_EA,
•
IARG_MEMORYREAD_SIZE, IARG_UINT32,
•
ACCESS_TYPE_LOAD, IARG_END);
• if( INS_IsMemoryWrite(ins) ) // If instructions writes
•
// to memory
•
INS_InsertCall(ins, IPOINT_BEFORE,
•
(AFUNPTR) MemoryReference,
•
IARG_THREAD_ID, IARG_MEMORYWRITE_EA,
•
IARG_MEMORYWRITE_SIZE, IARG_UINT32,
•
ACCESS_TYPE_STORE, IARG_END);
•}
Pin Tool
MAIN
CMP$im – Instrument Memory
References
Software & Services Group
31
CMP$im – Analyze Memory
References
•VOID MemoryReference(
•
int tid, ADDRINT addrStart, int size, int type)
•{
•
for(addr=addrStart; addr<(addrStart+size);
•
addr+=LINE_SIZE)
•
LookupHierarchy( tid, FIRST_LEVEL_CACHE, addr, type);
•}
•VOID LookupHierarchy(
•
int tid, int level, ADDRINT addr, int accessType) {
•
result = cacheHier[tid][cacheLevel]->Lookup(
•
addr, accessType );
•
if( result == CACHE_MISS ) {
•
if( level == LAST_LEVEL_CACHE ) return;
•
if( IsShared(level) ) AcquireLock(&lock[level], tid);
•
LookupHierarchy(tid, level+1, addr, accessType);
•
ReleaseLock(&lock[level]);
•
}
•}
32
INSTR
ROUTINES
CacheHierarchy[MAX_NUM_THREADS][MAX_NUM_LEVELS];
MAIN
•CACHE_t
ANALYSIS
ROUTINES
•#include “cache_model.h”
Synchronization
Software & Services Group
point
2MB LLC Cache Behavior – 4 Threads,
AMMP
10 mil phase
0
Private
Cache Miss Rate
cumulative
0
Shared
Cache Miss Rate
•
Miss Rate:
•
Shared caches have
better hit rate when
compared to private
caches
– Private Cache:
75%
– Shared Cache:
50%
Instruction Count (billions)
Software & Services Group
33
Shared Refs & Shared Caches…
Cache Miss
% Total Accesses
A
1 Thread
2 Thread
3 Thread
(4 Threaded Run)
B
• Workloads have
•
Miss Rate
Private LLC
Miss Rate
GeneNet – 16MB LLC
Shared LLC
34
4 Thread
different phases of
execution
Shared caches BETTER
during phases when
shared data is
referenced frequently
HPCA’06: Jaleel et al.
Software & Services Group
34
Intel Thread Checker
•Detect data races
Paul Petersen,
Zhiqiang Ma
•Instrumentation
– Memory operations
– Synchronization operations
•Analysis
– Use dynamic history of lock acquisition and release
to form a partial order of memory references
[Lamport 1978]
– Unordered read/write and write/write pairs to same
location are races
Software & Services Group
35
a documented data race in the
art benchmark is detected
Software & Services Group
36
36
PinPlay : Workload capture and
deterministic replay
•Problem : Multi-threaded programs are inherently nondeterministic making their analysis, simulation,
debugging very challenging
•Solution: PinPlay : A Pin-based framework for
capturing an execution of multi-threaded program and
App and input
replaying it deterministically under Pin
not needed
Application
input
Application
Pin
logger
37
PinPlay
LOGS
Pin
Replayer
once we have
the log
Deterministic
replay on any
machine
Harish Patil & Cristiano Pereira
Software & Services Group
Joint work with Brad Calder, UCSD
Logging to provide deterministic
behavior
•Start with checkpoint: memory image of code and
data
•A thread is deterministic if every loads sees either:
– Data from original checkpoint
– Or a value computed and stored on the thread
•Potential non-determinism when a load sees a
memory location written by an external agent
– Another thread
– Or system call, DMA, etc.
•Log these values with timestamps
Software & Services Group
38
Example
Program execution
Thread T1
Thread T2
1: Store A
DirEntry: [A:D]
Last writer id:
1: Load F
WAW
T1:
2: Store A
3: Load F
Last writer id:
WAR
3: Store F
T2: 2
DirEntry: [E:H]
RAW
2: Load A
12
T2
T1
T1:
3
T2:
T1
13
Last_writer
SMO logs:
T1 2 T2 2
T1 3 T2 3
T2 2 T1 1
Last access to the DirEntry
Thread T2 cannot execute memory
reference 2 until T1 executes its
memory reference 1
Thread T1 cannot execute memory reference 2
until T2 executes its memory reference 2
Software & Services Group
39
Applying multi-threaded tracing to
software tools
•
Debugging. Customer interested in debugging tools
derived from PinPlay
–
–
–
Capture bug at customer, bring home log to debug
Capture multi-threaded “heisenbug”, replay multiple times
How: combine PinPlay tracing with transparent debugging
PinPlay
LOGS
Pin
Replayer
Debug
Agent
Pin debug agent enables
custom debugger commands
debugger
Standard
protocol
Software & Services Group
40
Reducing Instrumentation
Overhead
Total Overhead = Pin Overhead + Pintool Overhead
~5% for SPECfp and ~50% for SPECint
Pin team’s job is to minimize this
Usually much larger than pin overhead
Pintool writers can help minimize this!
Software & Services Group
41
Reducing the Pintool’s Overhead
Pintool’s Overhead
Instrumentation
Routines
Overhead
Analysis
Routines
Overhead
+
x Work required in the
Analysis Routine
Frequency of calling
an Analysis Routine
Work required for transiting
to Analysis Routine
+
Work done inside
Analysis Routine
Software & Services Group
42
Reducing Work in Analysis Routines
•Key: Shift computation from analysis routines to
instrumentation routines whenever possible
•This usually has the largest speedup
Software & Services Group
43
Counting control flow edges
jne
60
40
100
ret
40
40
call
jmp
60
jne
1
Software & Services Group
44
Edge Counting: a Slower Version
•...
•void docount2(ADDRINT src, ADDRINT dst, INT32 taken)
•{
•
COUNTER *pedg = Lookup(src, dst);
•
pedg->count += taken;
•}
Analysis
•void Instruction(INS ins, void *v) {
•
if (INS_IsBranchOrCall(ins))
•
{
•
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount2,
•
IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR,
•
IARG_BRANCH_TAKEN, IARG_END);
•
}
•}
•...
Instrumentation
Software & Services Group
45
Edge Counting: a Faster Version
•void docount(COUNTER* pedge, INT32 taken) {
•
pedg->count += taken;
•}
•void docount2(ADDRINT src, ADDRINT dst, INT32 taken) {
•
COUNTER *pedg = Lookup(src, dst);
•
pedg->count += taken;
•}
Analysis
•void Instruction(INS ins, void *v) {
•
if (INS_IsDirectBranchOrCall(ins)) {
•
COUNTER *pedg = Lookup(INS_Address(ins),
•
INS_DirectBranchOrCallTargetAddress(ins));
•
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount,
•
IARG_ADDRINT, pedg, IARG_BRANCH_TAKEN, IARG_END);
•
} else
•
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount2,
•
IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR,
•
IARG_BRANCH_TAKEN, IARG_END);
•}
•…
Instrumentation
Software & Services Group
46
Analysis Routines: Reduce Call
Frequency
•Key: Instrument at the largest granularity
whenever possible
Instead of inserting one call per instruction
Insert one call per basic block or trace
Software & Services Group
47
Slower Instruction Counting
counter++;
sub $0xff, %edx
counter++;
cmp %esi, %edx
counter++;
jle <L1>
counter++;
mov $0x1, %edi
counter++;
add $0x10, %eax
Software & Services Group
48
Faster Instruction Counting
Counting at BBL level
Counting at Trace level
counter += 3
sub $0xff, %edx
sub
$0xff, %edx
cmp
%esi, %edx
cmp
%esi, %edx
jle <L1>
counter += 2
mov $0x1, %edi
jle
<L1>
add
add $0x10, %eax
counter += 5
$0x10, %eax
mov
$0x1, %edi
counter+=3
L1
Software & Services Group
49
Reducing Work for Analysis Transitions
•Reduce number of arguments to analysis
routines
– Inline analysis routines
– Pass arguments in registers
– Instrumentation scheduling
Software & Services Group
50
Reduce Number of Arguments
•Eliminate arguments only used for debugging
•Instead of passing TRUE/FALSE, create 2 analysis
functions
– Instead of inserting a call to:
Analysis(BOOL val)
– Insert a call to one of these:
AnalysisTrue()
AnalysisFalse()
– IARG_CONTEXT is very expensive (> 10
arguments)
Software & Services Group
51
Inlining
Not-inlinable
Inlinable
int docount1(int i) {
int docount0(int i) {
if (i == 1000)
x[i]++
x[i]++;
return x[i];
}
return x[i];
}
Not-inlinable
int docount2(int i) {
x[i]++;
printf(“%d”, i);
Not-inlinable
void docount3() {
for(i=0;i<100;i++)
x[i]++;
Pin will inline analysis functions into
return x[i];
}
jitted application
code
}
Software & Services Group
52
Inlining
•Use the –log_inline invocation switch to record inlining decisions in pin.log
pin –log_inline –t mytool – app
•Look in pin.log
Analysis function (0x2a9651854c) from mytool.cpp:53
INLINED
Analysis function (0x2a9651858a) from mytool.cpp:178 NOT INLINED
The last instruction of the first BBL fetched is not a ret instruction
•Look at source or disassembly of the function in mytool.cpp at line 178
0x0000002a9651858a
0x0000002a9651858b
0x0000002a9651858e
0x0000002a96518595
0x0000002a96518597
0x0000002a9651859e
0x0000002a965185a4
push rbp
mov rbp, rsp
mov rax, qword ptr [rip+0x3ce2b3]
inc dword ptr [rax]
mov rax, qword ptr [rip+0x3ce2aa]
cmp dword ptr [rax], 0xf4240
jnz 0x11
– The function could not be inlined because it contains a control-flow changing
instruction (other than ret)
Software & Services Group
53
Conditional Inlining
• Inline a common scenario where the analysis
routine has a single “if-then”
– The “If” part is always executed
– The “then” part is rarely executed
– Useful cases:
1. “If” can be inlined, “Then” is not
2. “If” has small number of arguments, “then” has many arguments
(or IARG_CONTEXT)
• Pintool writer breaks analysis routine into two:
– INS_InsertIfCall (ins, …, (AFUNPTR)doif, …)
– INS_InsertThenCall (ins, …, (AFUNPTR)dothen, …)
Software & Services Group
54
IP-Sampling (a Slower Version)
const INT32 N = 10000; const INT32 M = 5000;
INT32 icount = N;
VOID IpSample(VOID* ip) {
--icount;
if (icount == 0) {
fprintf(trace, “%p\n”, ip);
icount = N + rand()%M; //icount is between <N, N+M>
}
}
VOID Instruction(INS ins, VOID *v) {
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)IpSample,
IARG_INST_PTR, IARG_END);
}
Software & Services Group
55
IP-Sampling (a Faster Version)
INT32 CountDown() {
--icount;
inlined
return (icount==0);
}
VOID PrintIp(VOID *ip) {
fprintf(trace, “%p\n”, ip);
not inlined
icount = N + rand()%M; //icount is between <N, N+M>
}
VOID Instruction(INS ins, VOID *v) {
// CountDown() is always called before an inst is executed
INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)CountDown,
IARG_END);
// PrintIp() is called only if the last call to CountDown()
// returns a non-zero value
INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)PrintIp,
IARG_INST_PTR, IARG_END);
}
Software & Services Group
56
Optimizing Your Pintools Summary
• Baseline Pin has fairly low overhead (~5-20%)
• Adding instrumentation can increase overhead
significantly, but you can help!
1. Move work from analysis to instrumentation
routines
2. Explore larger granularity instrumentation
3. Explore conditional instrumentation
4. Understand when Pin can inline instrumentation
Software & Services Group
57
Part3: Deeper into Pin API
• Agenda
– memtrace_simple tool
– membuffer_simple tool
– branchbuffer_simple tool
– Symbols DebugInfo
– Probe-Mode
– Multi-Threading
Software & Services Group
58
memtrace_simple
• Tool code collects pairs of {appIP, memAddr} of
memory accessing instructions into a per-thread
buffer
– Remember: It is the application thread(s) that execute the
Pin Tool code.
• When the buffer becomes full, tool code processes
the entries in the buffer, resets the collection to
start at the beginning of the buffer, then execution
continues – and so-forth.
• Is a representative of many memory-trace
processing tools
Software & Services Group
59
memtrace_simple
• Tool code must
– Instrument each memory accessing instruction
– Determine where in the buffer the {appIP, memAddr} of the
instruction should be written
– Determine when the buffer becomes full
• Will instrument instructions on Trace level – i.e.
TRACE_AddInstrumentFunction(Trace, 0);
– Not all instructions in the trace will necessarily execute
each time trace is executed – because of early exits.
• Will try to allocate, in the buffer, maximum space
needed by trace at the trace start – if not enough
space => buffer is full
Software & Services Group
60
memtrace_simple
• Instrumentation code for each memory accessing
instruction in the trace will write it’s
{appIP, memAddr} pair to a constant offset from
the start of the trace in the buffer.
– Empty pairs (those instructions that were NOT executed)
will be denoted by having an appIP==0.
Software & Services Group
61
memtrace_simple
Trace
If
endOf(Previous)TraceReg
+ TotalSizeOccupiedByTraceInBuffer > endOfBufferReg
Then
Buffer
Call BufferFull
endOfTraceReg += TotalSizeOccupiedByTraceInBuffer
Early Exit
appIP
memAddr
Trace Exit
appIP
memAddr
endOfTraceReg
TotalSizeO
ccupiedBy
TraceInBu
ffer
appIP
Non memory access ins
Instrumentation code for following memory access ins
Memory access ins
memAddr
endOfBufferReg
Software & Services Group
62
memtrace_simple
Trace
If
endOf(Previous)TraceReg
+ TotalSizeOccupiedByTraceInBuffer > endOfBufferReg
Then
Call BufferFull
endOfTraceReg += TotalSizeOccupiedByTraceInBuffer
Buffer
appIP
endOfTraceReg
memAddr
appIP
memAddr
Early Exit
appIP
memAddr
Trace Exit
Non memory access ins
Instrumentation code for following memory access ins
Memory access ins
endOfTraceReg
TotalSizeO
ccupiedBy
TraceInBu
ffer
endOfBufferReg
Software & Services Group
63
memtrace_simple
• Tool will:
– iterate thru all INSs of the Trace
– Record which ones need to be instrumented (access memory)
– Record the ins, the memop, the offset from start of the trace in the buffer
where the {appIP, memAddr} pair of this ins should be written
– Get a sum of the TotalSizeOccupiedByTraceInBuffer
– Insert the IF-THEN sequence at the beginning of the trace
– Insert the update of endOfTraceReg just after the IF-THEN
sequence
– iterate thru recorded (memory accessing) INSs of the Trace
– Insert the instrumentation code before each recorded memory
accessing instruction
– this is the code that writes the {appIP, memAddr} pair into the buffer at the
designated offset (from start of trace) for this INS.
– endOfTraceReg and endOfBufferReg are virtual registers allocated
by Pin to the Pin tool.
Software & Services Group
64
memtrace_simple
TLS_KEY appThreadRepresentitiveKey; // Pin TLS key
REG endOfTraceInBufferReg; // Pin virtual Reg that will hold the pointer to the end of the trace data in
// the buffer
REG endOfBufferReg; // Pin virtual Reg that will hold the pointer to the end of the buffer
struct MEMREF {
ADDRINT appIP;
ADDRINT memAddr;
} ; // structure of the {appIP, memAddr} pair of a memory accessing ins in the buffer
int main(int argc, char * argv[])
{
PIN_Init(argc,argv) ;
// Pin TLS slot for holding the object that represents the application thread
appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0);
// get the registers to be used in each thread for managing the per-thread buffer
endOfTraceInBufferReg = PIN_ClaimToolRegister();
endOfBufferReg
= PIN_ClaimToolRegister();
TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0);
PIN_AddThreadStartFunction(ThreadStart, 0);
PIN_AddThreadFiniFunction(ThreadFini, 0);
PIN_AddFiniFunction(Fini, 0);
}
PIN_StartProgram();
Software & Services Group
65
memtrace_simple
KNOB<UINT32> KnobNumBytesInBuffer(KNOB_MODE_WRITEONCE, "pintool", "num_bytes_in_buffer",
"0x100000", "number of bytes in buffer");
APP_THREAD_REPRESENTITVE::APP_THREAD_REPRESENTITVE(THREADID myTid) {
_buffer = new char[KnobNumBytesInBuffer.Value()]; // Allocate the buffer
_numBuffersFilled = 0;
_numElementsProcessed = 0;
_myTid = myTid;
}
char * APP_THREAD_REPRESENTITVE::Begin() { return _buffer; }
char * APP_THREAD_REPRESENTITVE:: End()
{ return _buffer + KnobNumBytesInBuffer.Value(); }
VOID ThreadStart(THREADID tid, CONTEXT *ctxt, INT32 flags, VOID *v) // Pin callback on thread
// creation
{
// There is a new APP_THREAD_REPRESENTITVE object for every thread
APP_THREAD_REPRESENTITVE * appThreadRepresentitive
= new APP_THREAD_REPRESENTITVE(tid);
// A thread will need to look up its APP_THREAD_REPRESENTITVE, so save pointer in Pin TLS
PIN_SetThreadData(appThreadRepresentitiveKey, appThreadRepresentitive, tid);
// Initialize endOfTraceInBufferReg to point at beginning of buffer
PIN_SetContextReg(ctxt, endOfTraceInBufferReg,
reinterpret_cast<ADDRINT>(appThreadRepresentitive->Begin()));
}
// Initialize endOfBufferReg to point at end of buffer
PIN_SetContextReg(ctxt, endOfBufferReg,
reinterpret_cast<ADDRINT>(appThreadRepresentitive->End()));
Software & Services Group
66
memtrace_simple
void TraceAnalysisCalls(TRACE trace, void *) /*TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0)*/ {
// Go over all BBLs of the trace and for each BBL determine and record the INSs which need
// to be instrumented - i.e. the ins requires an analysis call
TRACE_ANALYSIS_CALLS_NEEDED traceAnalysisCallsNeeded;
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
DetermineBBLAnalysisCalls(bbl, &traceAnalysisCallsNeeded);
// If No memory accesses in this trace
if (traceAnalysisCallsNeeded.NumAnalysisCallsNeeded() == 0) return;
// APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer will determine if there are NOT enough
// available bytes in the buffer. If there are NOT then it returns TRUE and the BufferFull function is called
TRACE_InsertIfCall(trace, IPOINT_BEFORE,
AFUNPTR(APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer),
IARG_FAST_ANALYSIS_CALL,
IARG_REG_VALUE, endOfTraceInBufferReg, // previous trace
IARG_REG_VALUE, endOfBufferReg,
IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(),
IARG_END);
TRACE_InsertThenCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::BufferFull),
IARG_FAST_ANALYSIS_CALL,
IARG_REG_VALUE, endOfTraceInBufferReg,
IARG_THREAD_ID,
IARG_RETURN_REGS, endOfTraceInBufferReg,
IARG_END);
TRACE_InsertCall(trace, IPOINT_BEFORE,
AFUNPTR(APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer),
IARG_FAST_ANALYSIS_CALL,
IARG_REG_VALUE, endOfTraceInBufferReg,
IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(),
IARG_RETURN_REGS, endOfTraceInBufferReg,
IARG_END);
// Insert Analysis Calls for each INS on the trace that was recorded as needing one
traceAnalysisCallsNeeded.InsertAnalysisCalls();
}
Software & Services Group
67
memtrace_simple
static ADDRINT PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer ( // Pin will inline this function
char * endOfPreviousTraceInBuffer,
char * bufferEnd,
ADDRINT totalSizeOccupiedByTraceInBuffer)
{
return (endOfPreviousTraceInBuffer + totalSizeOccupiedByTraceInBuffer >= bufferEnd);
}
static char * PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::BufferFull ( // Pin will NOT inline this function
char *endOfTraceInBuffer, ADDRINT tid)
{
// Get this thread’s APP_THREAD_REPRESENTITVE from the Pin TLS
APP_THREAD_REPRESENTITVE * appThreadRepresentitive
= static_cast<APP_THREAD_REPRESENTITVE*>
(PIN_GetThreadData(appThreadRepresentitiveKey, tid));
appThreadRepresentitive->ProcessBuffer(endOfTraceInBuffer);
}
// After processing the buffer, move the endOfTraceInBuffer back to the beginning of the buffer
endOfTraceInBuffer = appThreadRepresentitive->Begin();
return endOfTraceInBuffer;
static char * PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer (// Pin will inline this function
{
}
char * endOfPreviousTraceInBuffer,
ADDRINT totalSizeOccupiedByTraceInBuffer)
return (endOfPreviousTraceInBuffer + totalSizeOccupiedByTraceInBuffer);
Software & Services Group
68
memtrace_simple
class ANALYSIS_CALL_INFO
public:
ANALYSIS_CALL_INFO(INS ins, UINT32 offsetFromTraceStartInBuffer, UINT32 memop) :
_ins(ins),
_offsetFromTraceStartInBuffer(offsetFromTraceStartInBuffer), _memop (memop) {}
void InsertAnalysisCall(INT32 sizeofTraceInBuffer);
private:
INS _ins; INT32 _offsetFromTraceStartInBuffer; UINT32 _memop;
{
};
class TRACE_ANALYSIS_CALLS_NEEDED
{
public:
TRACE_ANALYSIS_CALLS_NEEDED() : _numAnalysisCallsNeeded(0), _currentOffsetFromTraceStartInBuffer(0) {}
UINT32 NumAnalysisCallsNeeded() const { return _numAnalysisCallsNeeded; }
UINT32 TotalSizeOccupiedByTraceInBuffer() const { return _currentOffsetFromTraceStartInBuffer; }
void RecordAnalysisCallNeeded(INS ins, UINT32 memop) {
_analysisCalls.push_back(ANALYSIS_CALL_INFO(ins, _currentOffsetFromTraceStartInBuffer,
memop));
_currentOffsetFromTraceStartInBuffer += sizeof(MEMREF);
_numAnalysisCallsNeeded++;
}
void InsertAnalysisCalls();
private:
INT32 _currentOffsetFromTraceStartInBuffer;
INT32 _numAnalysisCallsNeeded;
vector<ANALYSIS_CALL_INFO> _analysisCalls;
};
void DetermineBBLAnalysisCalls (BBL bbl,
TRACE_ANALYSIS_CALLS_NEEDED * traceAnalysisCallsNeeded)
{
for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins))
{
// Iterate over each memory operand of the instruction.
for (UINT32 memOp = 0; memOp < INS_MemoryOperandCount(ins); memOp++)
// Record that an analysis call is needed, along with the info needed to generate the analysis
// call
traceAnalysisCallsNeeded->RecordAnalysisCallNeeded(ins, memOp);
}
}
•
69
Software & Services Group
memtrace_simple
static void PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::RecordMEMREFInBuffer ( // Pin will inline this function
char* endOfTraceInBuffer, ADDRINT offsetFromEndOfTrace, ADDRINT appIp, ADDRINT memAddr)
{
*reinterpret_cast<ADDRINT*>(endOfTraceInBuffer+ offsetFromEndOfTrace) = appIp;
*reinterpret_cast<ADDRINT*>(endOfTraceInBuffer+ offsetFromEndOfTrace +sizeof(ADDRINT))
= memAddr;
}
void ANALYSIS_CALL_INFO::InsertAnalysisCall(INT32 sizeofTraceInBuffer)
{
/* the place in the buffer where the {appIp, memAddr} of this _ins should be recorded is
computed by: endOfTraceInBufferReg
-sizeofTraceInBuffer + _offsetFromTraceStartInBuffer(of this _ins) */
INS_InsertCall(_ins, IPOINT_BEFORE,
AFUNPTR(APP_THREAD_REPRESENTITVE::RecordMEMREFInBuffer),
IARG_FAST_ANALYSIS_CALL,
IARG_REG_VALUE, endOfTraceInBufferReg,
IARG_ADDRINT, ADDRINT(_offsetFromTraceStartInBuffer - sizeofTraceInBuffer),
IARG_INST_PTR,
IARG_MEMORYOP_EA, _memop,
IARG_END);
}
void TRACE_ANALYSIS_CALLS_NEEDED::InsertAnalysisCalls()
{// Iterate over the recorded ANALYSIS_CALL_INFO elements – insert the analysis call
for (vector<ANALYSIS_CALL_INFO>::iterator c = _analysisCalls.begin();
c != _analysisCalls.end();
c++)
c->InsertAnalysisCall(TotalSizeOccupiedByTraceInBuffer());
}
Software & Services Group
70
membuffer_simple
• Since managing a per-thread buffer is a necessity of a large
class of Pin tools: Provide Pin APIs to make it (more) easy.
• Pin Buffering API, abstracts away the need for a Pin tool to
manage per-thread buffers
• PIN_DefineTraceBuffer
– Define a per-thread buffer that each application trace can write
data to
• INS_InsertFillBuffer
– Instrumentation code is generated to write the desired data into
the buffer
– This code is inlined
• Tool defined BufferFull function, instrumentation code will
cause this function to be called when the buffer becomes full
Software & Services Group
71
membuffer_simple
• Pin Buffering API actually works somewhat different
than memtrace
– Instrumentation code will insert the data generated by an
INS into the buffer immediately after the data generated
by the previously executed instrumented INS
– Better buffer utilization
– Requires the instrumentation to update the next buffer location
to write to – this was not required in the memtrace
implementatio
– All this is invisible to the Pin tool writer
• membuffer_simple is a Pin tool that uses the Pin
Buffering API to do the same memory access
recording that memtrace_simple does
Software & Services Group
72
membuffer_simple
KNOB<UINT32> KnobNumPagesInBuffer(KNOB_MODE_WRITEONCE, "pintool",
"num_pages_in_buffer", "256", "number of pages in buffer");
// Struct of memory reference written to the buffer
struct MEMREF
ADDRINT appIP;
ADDRINT memAddr;
{
};
// The buffer ID returned by the one call to PIN_DefineTraceBuffer
BUFFER_ID bufId;
TLS_KEY appThreadRepresentitiveKey;
int main(int argc, char * argv[])
PIN_Init(argc,argv) ;
{
// Pin TLS slot for holding the object that represents an application thread
appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0);
// Define the buffer that will be used – buffer is allocated to each thread when the thread starts
//running
bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), KnobNumPagesInBuffer,
BufferFull, // This Pin tool function will be called when buffer is full
0);
INS_AddInstrumentFunction(Instruction, 0); // The Instruction function will use the Pin Buffering
// API to insert the instrumentation code that writes
// the MEMREF of a memory accessing INS into the buffer
PIN_AddThreadStartFunction(ThreadStart, 0);
PIN_AddThreadFiniFunction(ThreadFini, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
}
Software & Services Group
73
membuffer_simple
/*
* Pin Callback called, by application thread, when a buffer fills up, or the thread exits
* Pin will NOT inline this function
* @param[in] id
buffer handle
* @param[in] tid
id of owning thread
* @param[in] ctxt
application context
* @param[in] buf
actual pointer to buffer
* @param[in] numElements number of records
* @param[in] v
callback value
* @return A pointer to the buffer to resume filling.
*/
VOID * BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT *ctxt, VOID *buf,
UINT64 numElements, VOID *v)
{
// retrieve the APP_THREAD_REPRESENTITVE* of this thread from the Pin TLS
APP_THREAD_REPRESENTITVE * appThreadRepresentitive
= static_cast<APP_THREAD_REPRESENTITVE*>( PIN_GetThreadData(
appThreadRepresentitiveKey, tid ) );
appThreadRepresentitive->ProcessBuffer(buf, numElements);
return buf;
}}
Software & Services Group
74
membuffer_simple
VOID Instruction (INS ins, VOID *v)
{
UINT32 numMemOperands = INS_MemoryOperandCount(ins);
// Iterate over each memory operand of the instruction.
for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++)
{ // Add the instrumentation code to write the appIP and memAddr
// of this memory operand into the buffer
// Pin will inline the code that writes to the buffer
INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId,
IARG_INST_PTR, offsetof(struct MEMREF, appIP),
IARG_MEMORYOP_EA, memOp,
offsetof(struct MEMREF, memAddr),
IARG_END);
}
}
Software & Services Group
75
branchbuffer_simple
• Use Pin Buffering API to collect a branch trace:
– For each executed branch instruction record:
– appIP of the branch instruction
– targetAddress of the branch instruction
– branchTaken boolean
Software & Services Group
76
branchbuffer_simple
KNOB<UINT32> KnobNumPagesInBuffer(KNOB_MODE_WRITEONCE, "pintool",
"num_pages_in_buffer", "256", "number of pages in
buffer");
struct BRANCH_INFO { // This is the structure of the data that will be written into the buffer
ADDRINT appIP;
ADDRINT targetAddress;
BOOL
branchTaken;
};
int main(int argc, char *argv[])
{
PIN_Init(argc,argv);
bufId = PIN_DefineTraceBuffer(sizeof(BRANCH_INFO), KnobNumPagesInBuffer, BufferFull,
0);
// Register function to be called to instrument traces
TRACE_AddInstrumentFunction(Trace, 0);
// Register function to be called when the application exits
PIN_AddFiniFunction(Fini, 0);
}
// Start the program, never returns
PIN_StartProgram();
Software & Services Group
77
branchbuffer_simple
void Trace(TRACE tr, void* V) // TRACE_AddInstrumentFunction(Trace, 0);
{
for(BBL bbl = TRACE_BblHead(tr); BBL_Valid(bbl); bbl=BBL_Next(bbl))
{
if (INS_IsBranchOrCall(BBL_InsTail(bbl))) // The branch instruction, if it exists, will always
// be the last in the BBL
{
}
}
INS_InsertFillBuffer(BBL_InsTail(bbl),
IPOINT_BEFORE,
bufId,
IARG_INST_PTR, offsetof(BRANCH_INFO, appIP),
IARG_BRANCH_TARGET_ADDR, offsetof(BRANCH_INFO, targetAddress),
IARG_BRANCH_TAKEN, offsetof(BRANCH_INFO, branchTaken),
IARG_END);
}
Software & Services Group
78
Symbols
• PIN_InitSymbols()
– Pin will use whatever symbol information is available
–
–
–
–
Debug info in the app
Pdb files
Export Tables
On Windows uses dbghelp
• Use symbols to instrument/wrap/replace specific
functions
– wrap/replace: see malloc replacement examples in intro
• Access application debug information from a Pin
tool
Software & Services Group
79
Symbols: Instrument malloc and free
int main(int argc, char *argv[])
{
// Initialize pin symbol manager
PIN_InitSymbols();
PIN_Init(argc,argv);
// Register the function ImageLoad to be called each time an image is loaded in the process
// This includes the process itself and all shared libraries it loads (implicitly or explicitly)
IMG_AddInstrumentFunction(ImageLoad, 0);
// Never returns
PIN_StartProgram();
}
Software & Services Group
80
Symbols: Instrument malloc and free
VOID ImageLoad(IMG img, VOID *v) // Pin Callback. IMG_AddInstrumentFunction(ImageLoad, 0);
{
// Instrument the malloc() and free() functions. Print the input argument
// of each malloc() or free(), and the return value of malloc().
RTN mallocRtn = RTN_FindByName(img, "_malloc"); // Find the malloc() function.
if (RTN_Valid(mallocRtn))
{
RTN_Open(mallocRtn);
// Instrument malloc() to print the input argument value and the return value.
RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);
}
}
RTN_Close(mallocRtn);
RTN freeRtn = RTN_FindByName(img, "_free"); // Find the free() function.
if (RTN_Valid(freeRtn))
{
RTN_Open(freeRtn);
// Instrument free() to print the input argument value.
RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)FreeBefore,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_Close(freeRtn);
}
Software & Services Group
81
Symbols: Instrument malloc
Handling name-mangling and multiple
symbols at same address
VOID Image(IMG img, VOID *v) // IMG_AddInstrumentFunction(Image, 0);
{
// Walk through the symbols in the symbol table.
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);
if (undFuncName == "malloc") // Find the malloc function.
{
RTN mallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(mallocRtn))
{
RTN_Open(mallocRtn);
// Instrument to print the input argument value and the return value.
RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter,
IARG_FUNCRET_EXITPOINT_VALUE,
IARG_END);
}
}
}
}
RTN_Close(mallocRtn);
Software & Services Group
82
Symbols: Accessing Application Debug
Info from a Pin Tool
VOID Instruction(INS ins, VOID *v) // INS_AddInstrumentFunction(Instruction, 0);
{
UINT32 numMemOperands = INS_MemoryOperandCount(ins);
// Iterate over each memory operand of the instruction.
for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++)
{
if (INS_MemoryOperandIsWritten(ins, memOp))
{ // Insert instrumentation code to catch a memory overwrite
INS_InsertIfCall(ins, IPOINT_BEFORE,
AFUNPTR(AnalyzeMemWrite),
IARG_FAST_ANALYSIS_CALL,
IARG_MEMORYOP_EA, memop,
IARG_MEMORYWRITE_SIZE,
IARG_END);
INS_InsertThenCall(ins, IPOINT_BEFORE,
AFUNPTR(MemoryOverWriteAt),
IARG_FAST_ANALYSIS_CALL,
IARG_INST_PTR,
IARG_MEMORYOP_EA, memop,
IARG_MEMORYWRITE_SIZE,
IARG_END);
}
}
}
Software & Services Group
83
Symbols: Accessing Application Debug
Info from a Pin Tool
KNOB<ADDRINT> KnobMemAddrBeingOverwritten(KNOB_MODE_WRITEONCE, "pintool",
"mem_overwrite_addr", "256", "overwritten memaddr");
static ADDRINT PIN_FAST_ANALYSIS_CALL
AnalyzeMemWrite ( // Pin will inline this function, it is the IF part
ADDRINT memWriteAddr, UINT32 numBytesWritten)
{
// return 1 if this memory write overwrites the address specified by
// KnobMemAddrBeingOverwritten
return (memWriteAddr<= KnobMemAddrBeingOverwritten &&
(memWriteAddr + numBytesWritten) > KnobMemAddrBeingOverwritten);
}
static VOID PIN_FAST_ANALYSIS_CALL
MemoryOverWriteAt ( // Pin will NOT inline this function, it is the THEN part
ADDRINT appIP, ADDRINT memWriteAddr, UINT32 numBytesWritten)
{
INT32 column, lineNum;
string fileName;
PIN_GetSourceLocation (appIP, &column, &line, &fileName);
}
printf ("overwrite of %p from instruction at %p originating from file %s line %d col %d\n",
KnobMemAddrBeingOverwritten, appIP, fileName.c_str(), lineNum, column);
printf (" writing %d bytes starting at %p\n", numBytesWritten, memWriteAddr);
Software & Services Group
84
Probe Mode
• JIT Mode
– Pin creates a modified copy of the application onthe-fly
– Original code never executes
More flexible, more common approach
• Probe Mode
– Pin modifies the original application instructions
– Inserts jumps to instrumentation code
(trampolines)
Lower overhead (less flexible) approach
Software & Services Group
85
Pin Probe-Mode
• Probe mode is a method of using Pin to wrap or replace
application functions with functions in the tool. A jump
instruction (probe), which redirects the flow of control to the
replacement function is placed at the start of the specified
function.
• The bytes being overwritten are relocated, so that Pin can
provide the replacement function with the address of the first
relocated byte. This enables the replacement function to call
the replaced (original) function.
• In probe mode, the application and the replacement routine
are run natively (not Jitted). This improves performance, but
puts more responsibility on the tool writer. Probes can only be
placed on RTN boundaries, and should inserted within the
Image load callback. Pin will automatically remove the probes
when an image is unloaded.
• Many of the PIN APIs that are available in JIT mode are not
available in Probe mode.
Software & Services Group
86
A Sample Probe
– A probe is a jump instruction that overwrites
original instruction(s) in the application
– Instrumentation invoked with probes
– Pin copies/translates original bytes so probed (replaced)
functions can be called from the replacement function
Copy of entry point with
0x50000004: push
0x50000005: mov
0x50000007: push
0x50000008: push
0x50000009: jmp
original bytes:
%ebp
%esp,%ebp
%edi
%esi
0x400113d9
0x41481064: push %ebp
Original
function
entry point:
Entry
point
overwritten
with probe:
push 0x41481064
%ebp
0x400113d4: jmp
0x400113d5: mov %esp,%ebp
0x400113d7: push %edi
0x400113d8: push %esi
0x400113d9: push %ebx
// tool wrapper func
::::::::::::::::::::
0x414827fe: call 0x50000004 // call original func
Software & Services Group
87
PinProbes Instrumentation
• Advantages:
– Low overhead – few percent
– Less intrusive – execute original code
– Leverages Pin:
– API
– Instrumentation engine
• Disadvantages:
– More tool writer responsibility
– Routine-level granularity (RTN)
Software & Services Group
88
Using Probes to Replace/Wrap a
Function
• RTN_ReplaceSignatureProbed() redirects all calls
to application routine rtn to the specified
replacementFunction
– Can add IARG_* types to be passed to the replacement
routine, including pointer to original function and
IARG_CONTEXT.
– Replacement function can call original function.
• To use:
– Must use PIN_StartProgramProbed()
– Application prototype is required
Software & Services Group
89
Malloc Replacement Probe-Mode
#include "pin.H"
void * MallocWrapper(AFUNPTR pf_malloc, size_t size)
{ // Simulate out-of-memory every so often
void * res;
if (TimeForOutOfMem())
return (NULL);
res = pf_malloc(size);
return res; }
VOID ImageLoad (IMG img, VOID *v) {
if (strstr(IMG_Name(img).c_str(), "libc.so") ||
strstr(IMG_Name(img).c_str(), "MSVCR80") || strstr(IMG_Name(img).c_str(), "MSVCR90"))
{
RTN mallocRtn = RTN_FindByName(img, "malloc");
if ( RTN_Valid(mallocRtn) && RTN_IsSafeForProbedReplacement(mallocRtn) )
{
PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc",
PIN_PARG(size_t), PIN_PARG_END() );
} }}
RTN_ReplaceSignatureProbed(mallocRtn,
AFUNPTR(MallocWrapper),
IARG_PROTOTYPE, proto_malloc,
IARG_ORIG_FUNCPTR,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
int main(int argc, CHAR *argv[]) {
PIN_InitSymbols();
PIN_Init(argc,argv));
IMG_AddInstrumentFunction(ImageLoad, 0);
PIN_StartProgramProbed();
}
Software & Services Group
90
Using Probes to Call Analysis
Functions
• RTN_InsertCallProbed() invokes the analysis
routine before or after the specified rtn
– Use IPOINT_BEFORE or IPOINT_AFTER
– Pin may NOT be able to find all AFTER points on
the function when it is running in Probe-Mode
– PIN IARG_TYPEs are used for arguments
• To use:
– Must use PIN_StartProgramProbed()
– Application prototype is required
Software & Services Group
91
Symbols: Instrument malloc
Handling name-mangling and multiple
symbols at same address Probe-Mode
VOID Image(IMG img, VOID *v) // IMG_AddInstrumentFunction(Image, 0);
{
// Walk through the symbols in the symbol table.
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);
if (undFuncName == "malloc") // Find the malloc function.
{
RTN mallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));
if (RTN_Valid(mallocRtn))
{
RTN_Open(mallocRtn);
PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc",
PIN_PARG(size_t), PIN_PARG_END() );
// Instrument to print the input argument value and the return value.
RTN_InsertCallProbed(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore,
IARG_PROTOTYPE, proto_malloc,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
RTN_InsertCallProbed(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter,
IARG_PROTOTYPE, proto_malloc,
IARG_FUNCRET_EXITPOINT_VALUE,
IARG_END);
}
}
RTN_Close(mallocRtn);
}
}
Software & Services Group
92
Tool Writer Responsibilities
• No control flow into the instruction space where
probe is placed
– 6 bytes on IA-32, 7 bytes on Intel64, 1 bundle
on IA64
– Branch into “replaced” instructions will fail
– Probes at function entry point only
• Thread safety for insertion and deletion of probes
– During image load callback is safe
– Only loading thread has a handle to the image
• Replacement function has same behavior as
original
Software & Services Group
93
Multi-Threading
• Have shown a number of examples of Pin tools
supporting multi-threading
• Pin fully supports multi-threading
– Application threads execute jitted code including
instrumentation code (inlined and not inlined), without any
serialization introduced by Pin
– Instrumentation code can use Pin and/or OS synchronization
constructs to introduce serialization if needed.
– Will see examples of this in Part3
– System calls require serialized entry to the VM before and after
execution – BUT actual execution is NOT serialized
– Pin does NOT create any threads of it’s own
– Pin callbacks are serialized
– Including the BufferFull callback
– Jitting is serialized
– Only one application thread can be jitting code at any time
Software & Services Group
94
Multi-Threading
• Pin Tools, in Jit-Mode, can:
– Track Threads
– ThreadStart, ThreadFini callbacks
– IARG_THREAD_ID
– Use Pin TLS for thread-specific data
– Use Pin Locks to synchronize threads
– Create threads to do Pin Tool work
– Use Pin provided APIs to do this
– Otherwise these threads would be Jitted
– Details in Part3
Software & Services Group
95
Part3 Summary
• Saw Examples of
– Allocating Pin Registers for Pin Tool Use
– Pin IF-THEN instrumentation
– Changing register values in instrumentation code
– Changing register values in CONTEXT
– Knobs
– Pin TLS
– Pin Buffering API
– Using Symbol and Debug Info
– Probe-Mode
– Multi-Threading support
Software & Services Group
96
Part4: Advanced Pin API
“To boldly go where no PinHead has
gone before…”
• Agenda
– membuffer_threadpool tool
– Using multiple buffers in the Pin Buffering API
– Using Pin Tool Threads
– Using Pin and OS locks to synchronize threads
– System call instrumentation
– Instrumenting a process tree
– CONTEXT* and IARG_CONTEXT
– Managing Exceptions and Signals
– Accessing Decode API
– Pin Code-Cache API
– Transparent debugging, and extending the debugger
Software & Services Group
97
membuffer_threadpool
• Recall membuffer_simple:
– Uses Pin Buffering API
– One buffer for each thread
– Inlined call to INS_InsertFillBuffer writes instrumentation data
into the buffer
– Application threads execute jitted application and instrumentation code
– When buffer becomes full the Pin Tool defined BufferFull callback
is called (by the application thread)
– Process the data in the buffer
– After the buffer is processed it is set to be re-filled from the top
– Application thread continues executing jitted application and instrumentation
code
– All Pin callbacks are serialized
– Only one buffer is being processed at any time
Software & Services Group
98
membuffer_threadpool
• Improvement: Process buffers that become full asynchronously, allows
application code to continue executing while buffers are being processed.
– Pin Buffering API supports multiple buffers per-thread
– Each application thread will allocate a number of buffers.
– The buffers allocated by the thread can only be used by the allocating thread so:
– Each application thread will have a buffers-free list, holding all buffers that are not currently
full or being filled.
– Pin supports creating Pin Tool threads, these are NOT jitted and can be used to do
Pin Tool work asynchronously.
– A number of these threads will be created, their job is:
–
–
Process buffers that become full. These will be located on a global full-buffers list.
After processing, return them to the buffers-free list of the application thread that filled them
– Application threads execute jitted application code and instrumentation code – the
instrumentation code writes data into the buffers and when it detects that the buffer
is full calls the BufferFull callback.
– The BufferFull callback function will NOT process the buffer
– Remember it is executed by an application thread
– It places the buffer on the global full-buffers list
– It retrieves a free buffer from this application thread’s free buffer list and returns it as the next
buffer to fill.
Software & Services Group
99
membuffer_threadpool
Application thread
Pin Tool Processing thread
Buffer being filled
buffers-free list
buffers-full list
Application thread
buffers-free list
Buffer being filled
Buffer
becomes
full
BufferFull
function
executed
Buffer
becomes
full
BufferFull
function
executed
Buffer
Processing
finishes.
Buffer
returned to
owner’s
buffers-free
list
Pin Tool Processing thread
Software & Services Group
100
membuffer_threadpool
int main(int argc, char *argv[])
PIN_Init(argc,argv);
{
// Pin TLS slot for holding the object that represents an application thread
appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0);
// Define the buffer that will be used –
bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), KnobNumPagesInBuffer,
BufferFull, // This Pin tool function will be called when buffer is full
0);
TRACE_AddInstrumentFunction(Trace, 0); // add an instrumentation callback function
// add callbacks
PIN_AddThreadStartFunction(ThreadStart, 0);
PIN_AddThreadFiniFunction(ThreadFini, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_AddFiniUnlockedFunction(FiniUnlocked, 0); // Used for Pin Tool thread termination
/* It is safe to create internal threads in the tool's main procedure and spawn new
* internal threads from existing ones. All other places, like Pin callbacks and
* analysis routines in application threads, are not safe for creating internal threads. */
// NOTE: These threads are NOT jitted, Need to discuss when the threads actually start running
for (int i=0; i<KnobNumProcessingThreads; i++)
{
THREADID threadId; PIN_THREAD_UID threadUid;
threadId
= PIN_SpawnInternalThread (BufferProcessingThread, NULL, 0, &threadUid);
RecordToolThreadCreated(threadUid); /* Used for Pin Tool thread termination */ }
PIN_StartProgram(); /* Start the program, never returns */
}
Software & Services Group
101
membuffer_threadpool
static void RecordToolThreadCreated (PIN_THREAD_UID threadUid)
{ // Record the unique ID of the Pin Tool thread
BOOL insertStatus;
insertStatus = (uidSet.insert(threadUid)).second;
}
// The thread function of Pin Tool threads – this code runs natively: NO Jitting
static VOID BufferProcessingThread(VOID * arg)
{
processingThreadRunning = TRUE; // Indicate that thread has started running
THREADID myThreadId = PIN_ThreadId();
}
while (!doExit)
{
VOID *buf;
UINT64 numElements;
APP_THREAD_REPRESENTITVE *appThreadRepresentitive;
// Get full buffer from the full buffer list
fullBuffersListManager.GetBufferFromList(&buf ,&numElements,
&appThreadRepresentitive, myThreadId);
if (buf == NULL)
{ // this will happen at process termination time – when there are NO
ASSERTX(doExit);
// no buffers left to process
break;
}
// Process the full buffer
ProcessBuffer(buf, numElements, appThreadRepresentitive);
// Put the processed buffer back on the free buffer list of the application thread that owns it
appThreadRepresentitive->FreeBufferListManager()
->PutBufferOnList(buf, 0, appThreadRepresentitive, myThreadId);
}
Software & Services Group
102
membuffer_threadpool
/*! Pin Callback
* Called by, instrumentation code, when a buffer fills up, or the thread exits, so the buffer can be
processed
* Called in the context of the application thread
* @param[in] id
buffer handle
* @param[in] tid
id of owning thread
* @param[in] ctxt
application context
* @param[in] buf
actual pointer to buffer
* @param[in] numElements
number of records
* @param[in] v
callback value
* @return A pointer to the buffer to resume filling.
*/
VOID * BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT *ctxt, VOID *buf,
UINT64 numElements, VOID *v)
{
// get the APP_THREAD_REPRESENTITVE of this app thread from the Pin TLS
APP_THREAD_REPRESENTITVE * appThreadRepresentitive
= static_cast<APP_THREAD_REPRESENTITVE*>( PIN_GetThreadData(
appThreadRepresentitiveKey, tid ) );
// Enqueue the full buffer, on the full-buffers list, and get the next buffer to fill, from this
// thread’s free buffer list
VOID *nextBuffToFill
= appThreadRepresentitive->EnqueFullAndGetNextToFill(buf, numElements);
}
return (nextBuffToFill);
Software & Services Group
103
membuffer_threadpool
VOID * APP_THREAD_REPRESENTITVE::EnqueFullAndGetNextToFill(VOID *fullBuf,
UINT64 numElements)
// cannot wait for Pin Tool threads to start running since this may cause deadlock
// because this app thread may be holding some OS resource that the Pin Tool
// thread needs to obtain in order to start - e.g. the LoaderLock
if ( !processingThreadRunning)
{ // process buffer in this app thread
ProcessBuffer(fullBuf, numElements, this);
return fullBuf;
}
if (!_buffersAllocated)
// now allocate the rest of the KnobNumBuffersPerAppThread buffers to be used
for (int i=0; i<KnobNumBuffersPerAppThread-1; i++)
_freeBufferListManager->PutBufferOnList(PIN_AllocateBuffer(bufId), 0, this, _myTid);
_buffersAllocated = TRUE;
{
{
}
// put the fullBuf on the full buffers list, on the Pin Tool processing
// threads will pick it from there, process it, and then put it on this app-thread's free buffer list
fullBuffersListManager.PutBufferOnList(fullBuf, numElements, this, _myTid);
// return the next buffer to fill.
// It is always taken from the free buffers list of this app thread. If the list is empty then this app
// thread will be blocked until one is placed there (by one of the Pin Tool buffer processing threads).
VOID *nextBufToFill;
UINT64 numElementsDummy;
APP_THREAD_REPRESENTITVE *appThreadRepresentitiveDummy;
_freeBufferListManager->GetBufferFromList(&nextBufToFill,
&numElementsDummy,
&appThreadRepresentitiveDummy,
_myTid);
ASSERTX(appThreadRepresentitiveDummy = this);
return nextBufToFill;
}
Software & Services Group
104
membuffer_threadpool
VOID Instruction (INS ins, VOID *v)
{
UINT32 numMemOperands = INS_MemoryOperandCount(ins);
// Iterate over each memory operand of the instruction.
for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++)
{ // Add the instrumentation code to write the appIP and memAddr
// of this memory operand into the buffer
// Pin will inline the code that writes to the buffer
INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId,
IARG_INST_PTR, offsetof(struct MEMREF, appIP),
IARG_MEMORYOP_EA, memOp,
offsetof(struct MEMREF, memAddr),
IARG_END);
}
}
Software & Services Group
105
membuffer_threadpool
class BUFFER_LIST_MANAGER
{
public:
BUFFER_LIST_MANAGER();
VOID PutBufferOnList (VOID *buf, UINT64 numElements, APP_THREAD_REPRESENTITVE *appThreadRepresentitive,
THREADID tid)
// build the list element
BUFFER_LIST_ELEMENT bufferListElement;
bufferListElement.buf = buf;
bufferListElement.numElements = numElements;
bufferListElement.appThreadRepresentitive = appThreadRepresentitive;
GetLock(&_bufferListLock, tid+1);
// lock the list, using a Pin lock
_bufferList.push_back(bufferListElement); // insert the element at the end of the list
ReleaseLock(&_bufferListLock);
// unlock the list
WIND::ReleaseSemaphore(_bufferSem, 1, NULL); // signal that there is a buffer on the list
{
}
VOID GetBufferFromList (VOID **buf ,UINT64 *numElements, APP_THREAD_REPRESENTITVE **appThreadRepresentitive,
THREADID tid){
WIND::WaitForSingleObject (_bufferSem, INFINITE);
// wait until there is a buffer on the list
GetLock(&_bufferListLock, tid+1);
// lock the list
BUFFER_LIST_ELEMENT &bufferListElement = (_bufferList.front()); // retrieve the first element of the list
*buf = bufferListElement.buf;
*numElements = bufferListElement.numElements;
*appThreadRepresentitive = bufferListElement.appThreadRepresentitive;
_bufferList.pop_front();
// remove the first element from the list
ReleaseLock(&_bufferListLock);
// unlock the list
}
VOID SignalBufferSem() {WIND::ReleaseSemaphore(_bufferSem, 1, NULL);}
UINT32 NumBuffersOnList () { return (_bufferList.size());}
private:
struct BUFFER_LIST_ELEMENT // structure of an element of the buffer list
VOID *buf;
UINT64 numElements;
APP_THREAD_REPRESENTITVE *appThreadRepresentitive; // the application thread that owns this buffer
{
};
WIND::HANDLE _bufferSem; // counting semaphore, value is #of buffers on the list, value==0 => WaitForSingleObject
blocks
PIN_LOCK _bufferListLock; // Pin Lock
list<const BUFFER_LIST_ELEMENT> _bufferList;
};
Software & Services Group
106
membuffer_threadpool
VOID ThreadFini(THREADID tid, const CONTEXT *ctxt, INT32 code, VOID *v)
{
// get the APP_THREAD_REPRESENTITVE of this app thread from the Pin TLS
APP_THREAD_REPRESENTITVE * appThreadRepresentitive
= static_cast<APP_THREAD_REPRESENTITVE*>(PIN_GetThreadData(
appThreadRepresentitiveKey, tid));
// wait for all my buffers to be processed
while(appThreadRepresentitive->_freeBufferListManager->NumBuffersOnList() !=
KnobNumBuffersPerAppThread-1)
PIN_Sleep(1);
}
delete appThreadRepresentitive;
PIN_SetThreadData(appThreadRepresentitiveKey, 0, tid);
static VOID FiniUnlocked(INT32 code, VOID *v)
{
BOOL waitStatus;
INT32 threadExitCode;
doExit = TRUE; // indicate that process is exiting
// signal all the Pin Tool threads to wake up and recognize the exit
for (int i=0; i<KnobNumProcessingThreads; i++)
fullBuffersListManager.SignalBufferSem();
// Wait until all Pin Tool threads exit
for (set<PIN_THREAD_UID>::iterator it = uidSet.begin(); it != uidSet.end(); ++it)
waitStatus = PIN_WaitForThreadTermination(*it, PIN_INFINITE_TIMEOUT, &threadExitCode);
}
Software & Services Group
107
System Call Instrumentation
VOID SyscallEntry(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v)
{
ADDRINT appIP = PIN_GetContextReg(ctxt, REG_INST_PTR);
}
printf ("syscall# %d at appIP %x param1 %x param2 %x param3 %x param4 %x param5 %x
param6 %x\n",
PIN_GetSyscallNumber(ctxt, std), appIP,
PIN_GetSyscallArgument(ctxt, std, 0), PIN_GetSyscallArgument(ctxt, std, 1),
PIN_GetSyscallArgument(ctxt, std, 2), PIN_GetSyscallArgument(ctxt, std, 3),
PIN_GetSyscallArgument(ctxt, std, 4), PIN_GetSyscallArgument(ctxt, std, 5));
VOID SyscallExit(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v)
{
printf(" returns: %x\n", PIN_GetSyscallReturn(ctxt, std);
}
int main(int argc, char *argv[])
{
PIN_Init(argc, argv);
// Instrument system calls via these Pin Callbacks and not via analysis functions
PIN_AddSyscallEntryFunction (SyscallEntry, 0);
PIN_AddSyscallExitFunction (SyscallExit, 0);
}
PIN_StartProgram(); // Never returns
Software & Services Group
108
Instrumenting a Process Tree
• Process A creates Process B
– Process B creates Process C and D
– And so forth
• Can use Pin to instrument all or part of the processes of a
process tree
– Use the –follow_exevc Pin invocation switch to turn this on
• Can use different Pin modes (Jit or Probe) on the different
processes in the process tree.
• Can use different Pin Tools on the different processes of a
process tree.
• Architecture of processes in the process tree may be
intermixed: e.g. Process A is 32bit, Process B is 64 bit,
Process C is 64 bit, Process D is 32 bit…
Software & Services Group
109
Instrumenting a Process Tree
// If this Pin Callback returns FALSE, then the child process will run Natively
BOOL FollowChild(CHILD_PROCESS childProcess, VOID * userData)
{
BOOL res; INT appArgc;
CHAR const * const * appArgv;
OS_PROCESS_ID pid = CHILD_PROCESS_GetId(childProcess);
// Get the command line that child process will be Pinned with, these are the Pin invocation switches
// that were specified when this (parent) process was Pinned
CHILD_PROCESS_GetCommandLine(childProcess, &appArgc, &appArgv);
// The Pin invocation switches of the child can be modified
INT pinArgc = 0;
CHAR const * pinArgv[20];
:::: Put values in pinArgv, Set pinArgc to be the number of entries in pinArgv that are to be used
CHILD_PROCESS_SetPinCommandLine(childProcess, pinArgc, pinArgv);
return TRUE; /* Specify Child process is to be Pinned */
}
int main(INT32 argc, CHAR **argv) {
PIN_Init(argc, argv);
cout << " Process is running on Pin in " << PIN_IsProbeMode() ? " Probe " : " Jit " << " mode "
// The FollowChild Pin Callback will be called when the application being Pinned is about to spawn
// child process
PIN_AddFollowChildProcessFunction (FollowChild, 0);
if ( PIN_IsProbeMode() )
PIN_StartProgramProbed(); // Never returns
else
PIN_StartProgram();
}
Software & Services Group
110
CONTEXT* and IARG_CONTEXT
• CONTEXT* is a Handle to the full register context of
the application at a particular point in the execution
• CONTEXT* is passed by default to a number of Pin
Callback functions: e.g.
– ThreadStart (registered by PIN_AddThreadStartFunction)
– BufferFull (registered by PIN_DefineTraceBuffer)
• Can request CONTEXT* be passed to an analysis
function by requesting and IARG_CONTEXT
Software & Services Group
111
CONTEXT* and IARG_CONTEXT
• Passing IARG_CONTEXT to an analysis function has
implications:
– The analysis function will NOT be inlined
– The passing of the IARG_CONTEXT is time consuming
• CONTEXT* can NOT be dereferenced. It is a handle to be
passed to Pin API functions
• Pin API functions supplied to Get and Set registers within the
CONTEXT
– Set has no affect on CONTEXT* passed into analysis function (via
IARG_CONTEXT request)
• Have seen examples of both Get and Set
• Have Pin API functions to Get and Set FP context
Software & Services Group
112
Managing Exceptions and Signals
Software & Services Group
113
Exceptions
• Catch Exceptions that occur in Pin Tool code
– Global exception handler
– PIN_AddInternalExceptionHandler
– Guard code section with exception handler
– PIN_TryStart
– PIN_TryEnd
Software & Services Group
114
Exceptions
VOID InstrumentDivide(INS ins, VOID* v)
{
if ((INS_Mnemonic(ins) == "DIV") &&
(INS_OperandIsReg(ins, 0)))
{ // Will Emulate div instruction with register operand
INS_InsertCall(ins,
IPOINT_BEFORE,
AFUNPTR(EmulateIntDivide),
IARG_REG_REFERENCE, REG_GDX,
IARG_REG_REFERENCE, REG_GAX,
IARG_REG_VALUE, REG(INS_OperandReg(ins, 0)),
IARG_CONTEXT,
IARG_THREAD_ID,
IARG_END);
INS_Delete(ins); // Delete the div instruction
}
int main(int argc, char * argv[])
{
PIN_Init(argc, argv);
INS_AddInstrumentFunction (InstrumentDivide, 0);
PIN_AddInternalExceptionHandler (GlobalHandler, NULL); // Registers a Global Exception Handler
PIN_StartProgram(); // Never returns
return 0;
}
Software & Services Group
115
Exceptions
EXCEPT_HANDLING_RESULT DivideHandler (THREADID tid, EXCEPTION_INFO * pExceptInfo,
PHYSICAL_CONTEXT * pPhysCtxt, // The context when the exception
// occurred
VOID *appContextArg
// The application context when the
// exception occurred
)
{
if(PIN_GetExceptionCode(pExceptInfo) == EXCEPTCODE_INT_DIVIDE_BY_ZERO)
{ // Divide by zero occurred in the code emulating the divide, use PIN_RaiseException to raise this exception
// at the appIP – for handling by the application
cout << " DivideHandler : Caught divide by zero." << PIN_ExceptionToString(pExceptInfo) << endl;
// Get the application IP where the exception occurred from the application context
CONTEXT * appCtxt = (CONTEXT *)appContextArg;
ADDRINT faultIp = PIN_GetContextReg (appCtxt, REG_INST_PTR);
// raise the exception at the application IP, so the application can handle it as it wants to
PIN_SetExceptionAddress (pExceptInfo, faultIp);
PIN_RaiseException (appCtxt, tid, pExceptInfo); // never returns
}
return EHR_CONTINUE_SEARCH;
}
VOID EmulateIntDivide(ADDRINT * pGdx, ADDRINT * pGax, ADDRINT divisor, CONTEXT * ctxt, THREADID tid)
PIN_TryStart(tid, DivideHandler, ctxt); // Register a Guard Code Section Exception Handler
{
UINT64 dividend = *pGdx;
dividend <<= 32;
dividend += *pGax;
*pGax = dividend / divisor;
*pGdx = dividend % divisor;
PIN_TryEnd(tid);
/* Guarded Code Section ends */
}
Software & Services Group
116
Exceptions
EXCEPT_HANDLING_RESULT GlobalHandler(THREADID threadIndex, EXCEPTION_INFO * pExceptInfo,
PHYSICAL_CONTEXT * pPhysCtxt, VOID *v)
{ // Any Exception occurring in Pin Tool, or Pin that is not in a Guarded Code Section will cause this function to be
// executed
cout << "GlobalHandler: Caught unexpected exception. " << PIN_ExceptionToString(pExceptInfo) << endl;
return EHR_UNHANDLED;
}
Software & Services Group
117
Exceptions, Monitoring Application
Exceptions
• PIN_AddContextChangeFunction
– Can monitor and change that application state at
application exceptions
int main(int argc, char **argv)
{
PIN_Init(argc, argv);
PIN_AddContextChangeFunction(OnContextChange, 0);
PIN_StartProgram();
}
Software & Services Group
118
Exceptions, Monitoring Application
Exceptions
static void OnContextChange (THREADID tid,
CONTEXT_CHANGE_REASON reason,
const CONTEXT *ctxtFrom // Application's register state at exception point
CONTEXT *ctxtTo,
// Application's register state delivered to handler
INT32 info,
VOID *v)
{
if (CONTEXT_CHANGE_REASON_SIGRETURN == reason || CONTEXT_CHANGE_REASON_APC == reason
|| CONTEXT_CHANGE_REASON_CALLBACK == reason
|| CONTEXT_CHANGE_REASON_FATALSIGNAL == reason
|| ctxtTo == NULL)
{ // don't want to handle these
return;
}
// change some register values in the context that the application will see at the handler
FPSTATE fpContextFromPin;
// change the bottom 4 bytes of xmm0
PIN_GetContextFPState (ctxtFrom, &fpContextFromPin);
fpContextFromPin.fxsave_legacy._xmm[3] = 'de';
fpContextFromPin.fxsave_legacy._xmm[2] = 'ad';
fpContextFromPin.fxsave_legacy._xmm[1] = 'be';
fpContextFromPin.fxsave_legacy._xmm[0] = 'ef';
PIN_SetContextFPState (ctxtTo,
&fpContextFromPin);
}
// change eax
PIN_SetContextReg(ctxtTo, REG_RAX, 0xbaadf00d);
Software & Services Group
119
Signals
• Establish an interceptor function for signals
delivered to the application
– Tools should never call sigaction() directly to handle
signals.
– function is called whenever the application receives the
requested signal, regardless of whether the application has
a handler for that signal.
– function can then decide whether the signal should be
forwarded to the application
Software & Services Group
120
Signals
• A tool can take over ownership of a signal in order
to:
– use the signal as an asynchronous communication
mechanism to the outside world.
– For example, if a tool intercepts SIGUSR1, a user of the tool could
send this signal and tell the tool to do something. In this usage
model, the tool may call PIN_UnblockSignal() so that it will receive the
signal even if the application attempts to block it.
– "squash" certain signals that the application generates.
– a tool that forces speculative execution in the application may want to
intercept and squash exceptions generated in the speculative code.
• A tool can set only one "intercept" handler for a
particular signal, so a new handler overwrites any
previous handler for the same signal. To disable a
handler, pass a NULL function pointer.
Software & Services Group
121
Signals
BOOL EnableInstrumentation = FALSE;
BOOL SignalHandler(THREADID, INT32, CONTEXT *, BOOL, const EXCEPTION_INFO *, void *)
{
// When tool receives the signal, enable instrumentation. Tool calls
// PIN_RemoveInstrumentation() to remove any existing instrumentation from Pin's code cache.
EnableInstrumentation = TRUE;
PIN_RemoveInstrumentation();
return FALSE; /* Tell Pin NOT to pass the signal to the application. */
VOID Trace(TRACE trace, VOID *)
if (!EnableInstrumentation)
return;
}
{
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
BBL_InsertCall(bbl, IPOINT_BEFORE, AFUNPTR(AnalysisFunc), IARG_INST_PTR, IARG_END);}
int main(int argc, char * argv[])
PIN_Init(argc, argv);
{
PIN_InterceptSignal(SIGUSR1, SignalHandler, 0); // Tool should really determine which signal is NOT in
use by
// application
PIN_UnblockSignal(SIGUSR1, TRUE);
TRACE_AddInstrumentFunction(Trace, 0);
PIN_StartProgram();
}
Software & Services Group
122
Accessing the Decode API
• The decoder/encoder used is called XED
– http://www.pintool.org/docs/24110/Xed/html/
• Tool code can use the XED API
– E.g. decode an instruction inside an analysis routine.
Software & Services Group
123
Accessing the Decode API
extern "C" {
#include "xed-interface.h"
}
static VOID PIN_FAST_ANALYSIS_CALL
MemoryOverWriteAt ( // Pin will NOT inline this function, it is the THEN part
ADDRINT appIP, ADDRINT memWriteAddr, UINT32 numBytesWritten)
{
INT32 column, lineNum;
string fileName;
PIN_GetSourceLocation (appIP, &column, &line, &fileName);
static const xed_state_t dstate = { XED_MACHINE_MODE_LEGACY_32, XED_ADDRESS_WIDTH_32b};
xed_decoded_inst_t xedd;
xed_decoded_inst_zero_set_mode (&xedd, &dstate);
xed_error_enum_t xed_code = xed_decode (&xedd, reinterpret_cast<UINT8*>(appIP), 15);
char buf[256];
xed_decoded_inst_dump_intel_format(&xedd, buf, 256, appIP);
}
printf ("overwrite of %p from instruction at %p %s originating from file %s line %d col %d\n",
KnobMemAddrBeingOverwritten, appIP, buf, fileName.c_str(), lineNum, column);
printf (" writing %d bytes starting at %p\n", numBytesWritten, memWriteAddr);
Software & Services Group
124
Pin Code-Cache API
• The Code-Cache API allows a Pin Tool to:
– Inspect Pin's code cache and/or alter the code cache
replacement policy
– Assume full control of the code cache
– Remove all or selected traces from the code cache
– Monitor code cache activity, including start/end of
execution of code in the code cache
Software & Services Group
125
Pin Code-Cache API
VOID DoSmcCheck(VOID * traceAddr, VOID * traceCopyAddr, USIZE traceSize, CONTEXT * ctxP) {
if (memcmp(traceAddr, traceCopyAddr, traceSize) != 0) /* application code changed */
{
// the jitted trace is no longer valid
free(traceCopyAddr);
CODECACHE_InvalidateTraceAtProgramAddress((ADDRINT)traceAddr);
PIN_ExecuteAt(ctxP); /* Continue jited execution at this application trace */
} }
VOID InstrumentTrace(TRACE trace, VOID *v)
VOID * traceAddr;
VOID * traceCopyAddr;
{
USIZE traceSize;
traceAddr = (VOID *)TRACE_Address(trace); // The appIP of the start of the trace
traceSize = TRACE_Size(trace);
traceCopyAddr = malloc(traceSize);
// The size of the original application trace in bytes
if (traceCopyAddr != 0)
{
memcpy(traceCopyAddr, traceAddr, traceSize);
// Copy of original application code in trace
// Insert a call to DoSmcCheck before every trace
TRACE_InsertCall(trace, IPOINT_BEFORE, (AFUNPTR)DoSmcCheck,
IARG_PTR, traceAddr,
IARG_PTR, traceCopyAddr,
IARG_UINT32 , traceSize,
IARG_CONTEXT,
IARG_END);
}
}
int main(int argc, char * argv[]) {
PIN_Init(argc, argv);
TRACE_AddInstrumentFunction(InstrumentTrace, 0);
PIN_StartProgram();
}
Software & Services Group
126
Transparent debugging, and extending
the debugger
• Transparently debug the application while it is
running on Pin + Pin Tool
– Have detailed explanation in the Pin User Manual
• Use Pin Tool to enhance/extend the debugger
capabilities
– Watchpoint: Is order of magnitude faster when
implemented using Pin Tool
– See previous “Symbols: Accessing Application Debug Info from
a Pin Tool”
– Which branch is branching to address 0
– Easy to write a Pin Tool that implements this
Software & Services Group
127
Part4 Summary
• Boldly went where no pin head has gone before…
– Lived to tell the tail
– membuffer_threadpool tool
– Using multiple buffers in the Pin Buffering API
– Using Pin Tool Threads
– Using Pin and OS locks to synchronize threads
– System call instrumentation
– Instrumenting a process tree
– CONTEXT* and IARG_CONTEXT
– Managing Exceptions and Signals
– Decode API
– Pin Code-Cache API
– Transparent debugging, and extending the debugger
Software & Services Group
128
Part5 Performance #s
Software & Services Group
129
Performance
Applications
SpecINT
SPEC CPU2000
SPEC integer benchmarks
SpecFP
SPEC CPU2000
SPEC floating point benchmarks
Cinebench
MAXON CINEBENCH 10
Graphics performance benchmarks
Povray
POV-Ray
Ray tracing
RealOne
RealOne Player 1.0
Media player
Illustrator
Adobe Illustrator 9.1
Graphics design
Dreamweaver
Adobe Dreamweaver 9
Website design
Director
Macromedia Director 9
3D games, demos
MediaEncoder
Microsoft Media Encoder 9
Audio and video capturing
Word, Excel, PowerPoint,
Access, Outlook
Microsoft Office XP
Office applications
Instrumentation
BBCount
Lightweight
Counts executed basic blocks
MemTrace
Middleweight
Records memory references
MemError (Intel Parallel Inspector)
Heavyweight
Detects memory leaks, uninitialized variables, etc.
Workloads
SpecINT, SpecFP
Reference input
Cinebench, Povray
Rendering an image (scalable)
Other GUI applications
Proprietary Visual Test scripts
Software & Services Group
130
Pin in Windows vs. Pin in Linux Without
Instrumentation
SPEC CPU2000 32bit without Instrumentation
4.0X
Windows
Linux
Slowdown (Relative to Native)
3.5X
3.0X
2.5X
2.0X
1.5X
1.0X
0.5X
as
EO p i
M
EA
N
G
FP
-
ar
eq t
ua
ke
fa
ce
re
c
am
m
p
lu
ca
s
fm
a3
d
si
xt
ra
ck
p
tw
o
EO lf
M
EA
w
up N
w
is
e
sw
im
m
gr
id
ap
pl
u
m
es
a
ga
gl
e
IN
TG
bz
i
ex
vo
rt
ga
p
e
pe on
rlb
m
k
vp
r
gc
c
m
cf
cr
af
ty
pa
rs
er
gz
i
p
0.0X
• For CPU bound applications Pin shows nearly the same performance in Windows and
Linux
• SPECINT overhead is higher than SPECFP - more control instructions and low-tripcount code
131
& Services
Group
Pin uses the same binary translationSoftware
technique
on
Windows and Linux
Pin in Windows vs. Pin in Linux With Instrumentation
SPEC CPU2000 32bit with inscount2
Linux
7X
Slowdown (Relative to Native)
Windows
6X
5X
4X
3X
2X
1X
as
EO p i
M
EA
N
G
FP
-
a
eq rt
ua
ke
fa
ce
re
c
am
m
p
lu
ca
s
fm
a3
d
si
xt
ra
ck
ga
p
vo
rte
x
bz
ip
IN
tw
To
G
EO lf
M
E
wu AN
pw
is
e
sw
im
m
gr
id
ap
pl
u
m
es
a
ga
gl
e
e
pe on
rlb
m
k
vp
r
gc
c
m
cf
cr
af
ty
pa
rs
er
gz
i
p
0X
• Instrumentation overhead dominates the translation overhead
Software & Services Group
132
Pin in Windows vs. Pin in Linux With Instrumentation
SPEC CPU2000 32bit with MemTrace
12X
Windows
Linux
11X
Slowdown (Relative to Native)
10X
9X
8X
7X
6X
5X
4X
3X
2X
1X
as
EO p i
M
EA
N
G
FP
-
ar
eq t
ua
ke
fa
ce
re
c
am
m
p
lu
ca
s
fm
a3
d
si
xt
ra
ck
ga
p
vo
rt
ex
bz
ip
IN
t
Tw
o
G
EO lf
M
EA
w
up N
w
is
e
sw
im
m
gr
id
ap
pl
u
m
es
a
ga
gl
e
e
pe on
rlb
m
k
se
r
ty
pa
r
cf
cr
af
m
vp
r
gc
c
gz
i
p
0X
• Instrumentation overhead dominates the translation overhead
Software & Services Group
133
Windows Applications with Light and Middleweight
Instrumentation
No Instrumentation
BBCount
9.0X
MemTrace
Slowdown (Relative to Native)
8.0X
7.0X
6.0X
5.0X
4.0X
3.0X
2.0X
1.0X
RealOne
MediaEncoder
Director
Dreamweaver
Outlook
Access
PowerPoint
Word
Excel
Illustrator
Povray
Cinebench
SpecFP
SpecINT
0.0X
• #instructions per branch shrinks => BBCount overhead grows
• %memory instructions grows => MemTrace overhead grows
Software & Services Group
134
Windows Applications with Heavyweight
Instrumentation
No Instrumentation
MemError
Slowdown (Relative to Native)
25.0X
20.0X
15.0X
10.0X
5.0X
Excel
Illustrator
Povray
Cinebench
SpecFP
SpecINT
0.0X
• The code translation overhead in Pin is much smaller than the
overhead of non-trivial analysis in tools
• MemError overhead is proportional to the number of memory
accesses
Software & Services Group
135
Pin Performance on Scalable Workloads
Native
CINEBENCH
POV-Ray
No Instrumentation
Speedup (Relative to Single Thread)
Speedup (Relative to Single Thread)
4.0X
3.5X
3.0X
2.5X
2.0X
1.5X
1.0X
0.5X
0.0X
4.0X
BBCount
3.5X
MemTrace
MemError
3.0X
2.5X
2.0X
1.5X
1.0X
0.5X
0.0X
2 threads
4 threads
2 threads
4 threads
• Pin VMM serialization does not impact scalability of the application
– Execution in the code cache is not serialized
• Scalability may drop due to limited memory bandwidth (MemTrace) or
contention for tool private data (MemError)
Software & Services Group
136
Kernel Interaction Overhead
Slowdown Relative to Native per Kernel Interaction
System
Calls
• assa
Exceptions
APCs
Callbacks
12X
10.5X
3X
Cost of a trip in Pin VMM for each system call is high
– ~3000 cycles in VMM vs. ~500 cycles for ring crossing
– Future work: a faster path in VMM for system calls
1.8X
Kernel Interaction Counts
Illustrator
System Calls
Excel
CINEBENCH
POV-Ray
1,659,298
658,683
101,700
75,313
Exceptions
1
0
0
0
APCs
6
6
24
24
Callbacks
73,062
68,767
961
7,682
Overhead vs. Total
Runtime
3.3%
2.8%
<1%
<1%
• Total overhead for handling kernel interactions is relatively low
– Kernel interactions are infrequent for majority of applications
Software & Services Group
137
Overall Summary
• Pin is Intel’s dynamic binary instrumentation engine
• Pin can be used to instrument all user level code
–
–
–
–
–
–
Windows, Linux
IA-32, Intel64, IA64
Product level robustness
Jit-Mode for full instrumentation: Thread, Function, Trace, BBL, Instruction
Probe-Mode for Function Replacement/Wrapping/Instrumentation only.
Pin supports multi-threading, no serialization of jitted application nor of instrumentation code
• Pin API makes Pin tools easy to write
–
Presented many tools, many fit on 1 ppt slide
• Pin performance is good
–
Pin APIs provide for writing efficient Pin tools
• Popular and well supported
–
30,000+ downloads, 400+ citations
• Free DownLoad
–
–
www.pintool.org
Includes: Detailed user manual, source code for 100s of Pin tools
• Pin User Group
–
–
http://tech.groups.yahoo.com/group/pinheads/
Pin users and Pin developers answer questions
Software & Services Group
138