What is Process Virtualization?

advertisement
Process Virtualization
and Symbiotic Optimization
Kim Hazelwood
ACACES Summer School
July 2009
About Your Instructor
Currently
– Assistant Professor at University of Virginia
– Faculty Consultant at Intel
Previously
– PostDoc at Intel (2004-2005)
– PhD from Harvard (2004)
– Four summer internships (HP & IBM)
– Worked with Dynamo, Jikes RVM, …
Other Interests
– Marathons (Boston, NYC, Disney)
– Reality TV Shows
– Family (8 month old at home!)
1
ACACES 2009 – Process Virtualization
About the Course
• Day 1 – What is Process Virtualization?
• Day 2 – Building Process Virtualization
Systems
• Day 3 – Using Process Virtualization Systems
• Day 4 – Symbiotic Optimization
•We’ll use Pin as a case study
www.pintool.org
•You’ll have homework!
2
ACACES 2009 – Process Virtualization
What is Process Virtualization?
System virtualization – allows multiple OSes to
share the same hardware
Process virtualization – runs as a normal
application (on top of an OS) and supports a
single process
App1
OS1
VMM
HW
App2
OS2
System
Virtualization
3
ACACES 2009 – Process Virtualization
App1
DBT
App2
DBI
OS
HW
Process
Virtualization
Classifying Virtualization
Dynamic binary optimization (x86  x86--)
• Complement the static compiler
– User inputs, phases, DLLs, hardware features
– Examples: DynamoRIO, Mojo, Strata
Dynamic translation (x86  PPC)
• Convert applications to run on a new architecture
– Examples: Rosetta, Transmeta CMS, DAISY
Dynamic instrumentation (x86  x86++)
• Inspect/add features to existing applications
– Examples: Pin, Valgrind
4
ACACES 2009 – Process Virtualization
A Simple Example of Instrumentation
Inserting extra code into a program to collect
runtime information
counter++;
sub $0xff, %edx
counter++;
cmp %esi, %edx
counter++;
jle <L1>
counter++;
mov $0x1, %edi
counter++;
add $0x10, %eax
5
ACACES 2009 – Process Virtualization
Instruction Count Output
$ /bin/ls
Makefile imageload.out itrace proccount
imageload inscount atrace itrace.out
$ pin -t inscount.so -- /bin/ls
Makefile imageload.out itrace proccount
imageload inscount atrace itrace.out
Count 422838
6
ACACES 2009 – Process Virtualization
A Simple Example of Optimization
On Pentium 3, inc is faster than add
On Pentium 4, add is faster than inc
sub
cmp
jle
mov
inc
7
$0xff, %edx
%esi, %edx
<L1>
$0x1, %edi
%eax
sub
cmp
jle
mov
add
ACACES 2009 – Process Virtualization
$0xff, %edx
%esi, %edx
<L1>
$0x1, %edi
$0x1, %eax
Research Applications
Computer Architecture Multicore
• Thread analysis
• Trace Generation
– Thread profiling
• Fault Tolerance Studies
– Race detection
• Emulating New
Instructions
• Cache simulations
Program Analysis
• Code coverage
• Call-graph generation
• Memory-leak detection
• Instruction profiling
8
Compilers
• Compare programs from
competing compilers
Security
• Add security checks and
features
ACACES 2009 – Process Virtualization
Approaches
• Source modification:
– Modify source programs
• Binary modification:
– Modify executables directly
Advantages for binary modification
 Language independent
 Machine-level view
 Modify legacy/proprietary software
9
ACACES 2009 – Process Virtualization
Static vs Dynamic Approaches
Dynamic approaches are more robust
 No need to recompile or relink
 Discover code at runtime
 Handle dynamically-generated code
 Attach to running processes
The Code Discovery Problem on x86
Indirect jump to ??
Instr 1
Instr 2
Instr 3
Jump
Data interspersed
Reg
DATA
Instr 5
Instr 6
with code
Uncond Branch
PADDING
Pad for alignment
Instr 8
10
ACACES 2009 – Process Virtualization
Dynamic Modification: Approaches
JIT Mode
• Create a modified copy of the application on-the-fly
• Original code never executes
More flexible, more common approach
Probe Mode
• Modifies the original application instructions
• Inserts jumps to modified code (trampolines)
Lower overhead (less flexible) approach
11
ACACES 2009 – Process Virtualization
JIT-Mode Binary Modification
Generate and cache modified copies of
instructions
EXE
Transform
Code
Cache
Profile
Execute
Modified (cached) instructions are executed in
lieu of original instructions
12
ACACES 2009 – Process Virtualization
JIT-Mode Instrumentation
Original code
Code cache
1’
1
2
3
5
Exits point back to
VMM
2’
4
7’
6
7
Fetch trace starting block 1
and start instrumentation
13
ACACES 2009 – Process Virtualization
Pin
JIT-Mode Instrumentation
Original code
Code cache
1’
1
2
3
5
2’
4
7’
6
7
14
Transfer control into
code cache (block 1)
ACACES 2009 – Process Virtualization
Pin
JIT-Mode Instrumentation
Original code
Code cache
trace linking
1
2
3
5
15
3’
2’
5’
7’
6’
4
6
7
1’
Fetch and instrument
a new trace
ACACES 2009 – Process Virtualization
Pin
Instrumentation Approaches
JIT Mode
• Create a modified copy of the application on-the-fly
• Original code never executes
More flexible, more common approach
Probe Mode
• Modify the original application instructions
• Insert jumps to instrumentation code (trampolines)
Lower overhead (less flexible) approach
16
ACACES 2009 – Process Virtualization
A Sample Probe
• A probe is a jump instruction that overwrites
original instruction(s) in the application
– Copy/translate original bytes so probed functions
can be called
Entry point overwritten with probe:
Original function entry point:
0x400113d4:
0x400113d5:
0x400113d7:
0x400113d8:
0x400113d9:
17
push
mov
push
push
push
%ebp
%esp,%ebp
%edi
%esi
%ebx
0x400113d4:
0x400113d9:
jmp
push
0x41481064
%ebx
Copy of entry point w/ original bytes:
0x50000004:
0x50000005:
0x50000007:
0x50000008:
0x50000009:
ACACES 2009 – Process Virtualization
push
mov
push
push
jmp
%ebp
%esp,%ebp
%edi
%esi
0x400113d9
Probe Instrumentation
Advantages:
• Low overhead – few percent
• Less intrusive – execute original code
Disadvantages:
• More tool writer responsibility
• Restrictions on where to modify (routine-level)
18
ACACES 2009 – Process Virtualization
Probe Tool Writer Responsibilities
No control flow into the instruction space
where probe is placed
• 6 bytes on IA32, 7 bytes on Intel64, bundle on
IA64
• Branch into “replaced” instructions will fail
• Probes at function entry point only
Thread safety for insertion/deletion of probes
• During image load callback is safe
• Only loading thread has a handle to the image
Replacement function has same behavior as
original
19
ACACES 2009 – Process Virtualization
Probe vs. JIT Summary
20
Probes
JIT
Overhead
Few percent
50% or higher
Intrusive
Low
High
Granularity
Function
boundary
Instruction
Safety &
Isolation
More
responsibility for
tool writer
High
ACACES 2009 – Process Virtualization
Process Virtualization Systems
Readily Available
• DynamoRIO
• Valgrind
• Pin
Available By Request
• Strata
• Adore
Unavailable
• Transmeta CMS
• Dynamo
21
ACACES 2009 – Process Virtualization
DynamoRIO
22
ACACES 2009 – Process Virtualization
Valgrind
23
ACACES 2009 – Process Virtualization
Pin
24
ACACES 2009 – Process Virtualization
Intel Pin
Dynamic Instrumentation:
• Do not need source code, recompilation, post-linking
Programmable Instrumentation:
• Provides rich APIs to write in C/C++ your own
instrumentation tools (called Pintools)
Multiplatform:
• Supports x86, x86-64, Itanium, Xscale
• Supports Linux, Windows, MacOS
Robust:
• Instruments real-life applications: Database, web browsers, …
• Instruments multithreaded applications
• Supports signals
Efficient:
• Applies compiler optimizations on instrumentation code
25
ACACES 2009 – Process Virtualization
Using Pin
Launch and instrument an application
$ pin –t pintool.so –- application
Instrumentation engine
Instrumentation tool
(provided in the kit)
(write your own, or use one
provided in the kit)
Attach to and instrument an application
$ pin –t pintool.so –pid 1234
26
ACACES 2009 – Process Virtualization
Pin Instrumentation APIs
Basic APIs are architecture independent:
• Provide common functionalities like determining:
– Control-flow changes
– Memory accesses
Architecture-specific APIs
• e.g., Info about opcodes and operands
Call-based APIs:
• Instrumentation routines
• Analysis routines
27
ACACES 2009 – Process Virtualization
Instrumentation vs. Analysis
Concepts borrowed from the ATOM tool:
Instrumentation routines define where
instrumentation is inserted
• e.g., before instruction
C Occurs first time an instruction is executed
Analysis routines define what to do when
instrumentation is activated
• e.g., increment counter
C Occurs every time an instruction is executed
28
ACACES 2009 – Process Virtualization
Pintool 1: Instruction Count
counter++;
sub $0xff, %edx
counter++;
cmp %esi, %edx
counter++;
jle <L1>
counter++;
mov $0x1, %edi
counter++;
add $0x10, %eax
29
ACACES 2009 – Process Virtualization
Pintool 1: Instruction Count Output
$ /bin/ls
Makefile imageload.out itrace proccount
imageload inscount0 atrace itrace.out
$ pin -t inscount0.so -- /bin/ls
Makefile imageload.out itrace proccount
imageload inscount0 atrace itrace.out
Count 422838
30
ACACES 2009 – Process Virtualization
#include <iostream>
#include "pin.h"
ManualExamples/inscount0.cpp
UINT64 icount = 0;
void docount() { icount++; }
analysis routine
void Instruction(INS ins, void *v)
instrumentation routine
{
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);
}
void Fini(INT32 code, void *v)
{ std::cerr << "Count " << icount << endl; }
int main(int argc, char * argv[])
{
PIN_Init(argc, argv);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
31
ACACES 2009 – Process Virtualization
Pintool 2: Instruction Trace
Print(ip);
sub $0xff, %edx
Print(ip);
cmp %esi, %edx
Print(ip);
jle <L1>
Print(ip);
mov $0x1, %edi
Print(ip);
add $0x10, %eax
Need to pass ip argument to the analysis routine (Printip())
32
ACACES 2009 – Process Virtualization
Pintool 2: Instruction Trace Output
$ pin -t itrace.so -- /bin/ls
Makefile imageload.out itrace proccount
imageload inscount0 atrace itrace.out
$ head -4 itrace.out
0x40001e90
0x40001e91
0x40001ee4
0x40001ee5
33
ACACES 2009 – Process Virtualization
ManualExamples/itrace.cpp
#include <stdio.h>
#include "pin.h"
argument to analysis routine
FILE * trace;
void printip(void *ip) { fprintf(trace, "%p\n", ip); }
analysis routine
instrumentation routine
void Instruction(INS ins, void *v) {
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip,
IARG_INST_PTR, IARG_END);
}
void Fini(INT32 code, void *v) { fclose(trace); }
int main(int argc, char * argv[]) {
trace = fopen("itrace.out", "w");
PIN_Init(argc, argv);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
34
ACACES 2009 – Process Virtualization
Examples of Arguments to Analysis
Routine
IARG_INST_PTR
– Instruction pointer (program counter) value
IARG_UINT32 <value>
– An integer value
IARG_REG_VALUE <register name>
– Value of the register specified
IARG_BRANCH_TARGET_ADDR
– Target address of the branch instrumented
IARG_MEMORY_READ_EA
– Effective address of a memory read
And many more … (refer to the manual for details)
35
ACACES 2009 – Process Virtualization
Instrumentation Points
Instrument points relative to an instruction:
• Before: IPOINT_BEFORE
• After:
– Fall-through edge: IPOINT_AFTER
– Taken edge: IPOINT_TAKEN_BRANCH
count()
count()
36
cmp
%esi, %edx
jle
<L1>
mov
$0x1, %edi
count()
<L1>:
ACACES 2009 – Process Virtualization
mov $0x8,%edi
Instrumentation Granularity
Instrumentation can be done at three
different granularities:
• Instruction
• Basic block
sub $0xff, %edx
– A sequence of instructions
terminated at a control-flow cmp %esi, %edx
changing instruction
jle
<L1>
– Single entry, single exit
• Trace
mov $0x1, %edi
– A sequence of basic blocks
add $0x10, %eax
terminated at an
jmp <L2>
unconditional control-flow
1 Trace, 2 BBs, 6 insts
changing instruction
– Single entry, multiple exits
37
ACACES 2009 – Process Virtualization
Pintool 3: Faster Instruction Count
counter += 3
sub $0xff, %edx
cmp
%esi, %edx
jle
<L1>
counter += 2
mov $0x1, %edi
add
38
$0x10, %eax
ACACES 2009 – Process Virtualization
basic blocks (bbl)
ManualExamples/inscount1.cpp
#include <stdio.h>
#include "pin.H“
UINT64 icount = 0;
analysis routine
void docount(INT32 c) { icount += c; }
void Trace(TRACE trace, void *v) { instrumentation routine
for (BBL bbl = TRACE_BblHead(trace);
BBL_Valid(bbl); bbl = BBL_Next(bbl)) {
BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount,
IARG_UINT32, BBL_NumIns(bbl), IARG_END);
}
}
void Fini(INT32 code, void *v) {
fprintf(stderr, "Count %lld\n", icount);
}
int main(int argc, char * argv[]) {
PIN_Init(argc, argv);
TRACE_AddInstrumentFunction(Trace, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
39
ACACES 2009 – Process Virtualization
What Did We Learn Today?
• Overview of Process Virtualization
• Approaches
• Source vs. Binary
• Static vs. Dynamic
• JIT vs. Probes
• Three Available Systems
• Three Simple Examples
40
ACACES 2009 – Process Virtualization
Want More Info?
• Read Jim Smith’s book: Virtual Machines
• Download one (or more) of them!
Pin
www.pintool.org
DynamoRIO
code.google.com/p/dynamorio
Valgrind
www.valgrind.org
Day
Day
Day
Day
41
1
2
3
4
–
–
–
–
What is Process Virtualization?
Building Process Virtualization Systems
Using Process Virtualization Systems
Symbiotic Optimization
ACACES 2009 – Process Virtualization
Download