Dixie: A Retargetable Binary Instrumentation Tool

advertisement
DIXIE
Binary Translation and Optimization
for Multiple ISAs
Computer Architecture Department
Universitat Politècnica de Catalunya-Barcelona
www.ac.upc.es/dixie
UPC people involved









Roger Espasa
Agustín Fernández
Manel Fernández
Victor Moya
Juan Lopez
Silvia Cernuda
Antonio Parada
Albert Ribé
Álex Ramírez
Dixie
 Static binary translator


Accepts multiple ISAs (Alpha, x86, PPC, Mips, Convex)
Translates to a common IR (Dixie ISA)
 Static binary instrumentation

Works on common IR but reflects source ISA
 Static binary optimizer


Optimizes the common IR
Generates native code from common IR

Multiple targets supported also (Alpha, Mips)
 Dixie Virtual Machine


Can run binaries specified in the common IR
Also runs binaries with mixture of common/native code
Dixie overview
Alpha
Convex
JANGO
Alpha
Convex
PowerPC
PowerPC
x86
x86
Mips
Target
binaries
Mips
D
I
X
I
E
C
Dixie
binary
Target
ISAs
DVM
(Dixie Virtual Machine)
User
specification
S
P
E
E
D
Y
Alpha
Mips
...
Native
ISAs
User
simulator
Outline
 Motivation
 DIXIE Architecture
 Debugging Tools
 Performance
 Summary
Outline
 Motivation
 DIXIE Architecture
 Debugging Tools
 Performance
 Summary
Binary Translation
 For embedded processors

Embedded market is




Rapidly moving
Changes processors frequently
Software (development, porting) is a major cost issue
Binary translation is cheaper than retargeting gcc
 Goals



Retargeting must be FAST and EASY
Support different ISAs
Provide good debugging tools


To ease writing ISA description
To verify correctness of translations
 Techniques


Static Translation (as much as possible)
Some Dynamic Translation (only if necessary)
Binary Optimization
 Inevitably, binary translation introduces overheads

Use static and dynamic optimization to


Adapt better to new chip
Offset overheads of static binary translation
 Goals

Eliminate overheads due to



Manual translation process
Intermediate ISA lack of expressiveness
Incremental development of the optimizer
 Techniques



Static optimization (as much as possible)
Dynamic optimization (only if necessary)
Optimized blocks still run within Virtual Machine
Instrumentation
 Instrumentation of program binaries



For computer architecture research
Due to lack of access to ‘exotic’ machines
Historical origin of Dixie…
 Many classes of tools, but...




Different tools for different machines
Porting tools is difficult
Few tools allow research on vector machines or new ISAs
Lack of wrong-path information
 Dixie goals




Cross-platform instrumentation
Research on multiple & discontinued ISAs
Full architecture coverage
Wrong-path information
Outline
 Motivation
 DIXIE Architecture
 Debugging Tools
 Performance
 Summary
Dixie overview
Alpha
Convex
JANGO
Alpha
Convex
PowerPC
PowerPC
x86
x86
Mips
Target
binaries
Mips
D
I
X
I
E
C
Dixie
binary
Target
ISAs
DVM
(Dixie Virtual Machine)
User
specification
S
P
E
E
D
Y
Alpha
Mips
...
Native
ISAs
User
simulator
Dixie compiler
Alpha
Convex
JANGO
Alpha
Convex
PowerPC
PowerPC
x86
x86
Mips
Target
binaries
Mips
D
I
X
I
E
C
Dixie
binary
Target
ISAs
DVM
(Dixie Virtual Machine)
User
specification
S
P
E
E
D
Y
Alpha
Mips
...
Native
ISAs
User
simulator
Jango
Alpha
Convex
JANGO
Alpha
Convex
PowerPC
PowerPC
x86
x86
Mips
Target
binaries
Mips
D
I
X
I
E
C
Dixie
binary
Target
ISAs
DVM
(Dixie Virtual Machine)
User
specification
S
P
E
E
D
Y
Alpha
Mips
...
Native
ISAs
User
simulator
Breakpoints: trace
mov a0,a1
MOV.lo.32 r11,r10
MOV.lo.32 r11,r10
ld.w @8(a1),a2
LOAD.lo.32 r500,r11,#8
TRACE vpc,r11,#8
sub.w #8,a2
LOAD.lo.32 r12,r500,#0
LOAD.lo.32 r500,r11,#8
SUB.c2.32 r12,r12,#8
TRACE vpc,r500,#0
LOAD.lo.32 r12,r500,#0
SUB.c2.32 r12,r12,#8
DIXIEC
JANGO
Speedy & DVM
Alpha
Convex
JANGO
Alpha
Convex
PowerPC
PowerPC
x86
x86
Mips
Target
binaries
Mips
D
I
X
I
E
C
Dixie
binary
Target
ISAs
DVM
(Dixie Virtual Machine)
User
specification
S
P
E
E
D
Y
Alpha
Mips
...
Native
ISAs
User
simulator
Speedy & DVM
 Dixie binary is optimized by Speedy
 Optimizations at basic block (BB) level


Translate Dixie BBs into native code
Generates .speedy sections
 Dixie binary is runable on top of the DVM
 Emulates the behavior of each Dixie instruction
Interpreting each Dixie instruction
 Jumping into sequences of “Speedy” BBs


Interacts with the user simulator


Through trace instructions inserted by Jango
Maps target system calls into host system calls

Through DixOS
DVM Portability
 DVM runs on all major hardware combinations:
Little Endian
Big Endian
Power2 / AIX
32 bits
x86 / LINUX
Sparc/SUNOS
Alpha / OSF1
64 bits
IA64/LINUX
MIPS / IRIX
Speedy Architecture
 Front End: Understands Dixie ISA


Optimizes Dixie Code (NOP, VPC, CSE)
Lowers Representation



Load Virtual Registers into physical registers
Local register allocation
Load large constants into registers
 Back End: Translates Dixie ISA into target ISA

Instruction translation




Opcode selection
Big/Little endian memory access
Alignment issues
Peephole Optimizer



Recognize instruction sequences
Remove redundant loads
Remove redundant branches
Outline
 Motivation
 DIXIE Architecture
 Debugging Tools
 Performance
 Summary
Debugging
 Porting to a new ISA is not easy
 Many “cut-and-paste” bugs
 A trivial bug may take weeks to be found without
appropriate tools
andiu.
ra, rs, ui
MOV.lo.32 r(TMP0),ui
SHL.lo.32 r(TMP0),r(TMP0),32
AND.lo.32 (ra),r(rs),r(TMP0)
CMPLT.c2.32 r(ICR(POSCRI(0))),r(ra),0
CMPGT.c2.32 r(ICR(POSCRI(1))),r(ra),0
CMPEQ.lo.32 r(ICR(POSCRI(2))),r(ra),0
AND.lo.32 r(TMP0),r(XER),0x80000000
CMPNE.lo.32 r(ICR(POSCRI(3))),r(TMP0),0
 We would like developers to
 “Test-as-you-go’’ every instruction description
 Test each instruction almost in isolation
 Quickly compare DVM and native results
Outline
 Motivation
 DIXIE Architecture
 Debugging Tools
 Performance
 Summary
Performance
 Benchmark suite
 SPECint95
 Environment
 DEC Alpha AXP-21264 running at 625 MHz
 OSF/1 v4.0
 Two versions of the Dixie binaries
 DVM: “pure” Dixie binaries
 Speedy: Dixie binaries optimized using Speedy
DVM slowdown
150
DVM
Speedy
125
100
75
50
Alpha on Alpha
vortex
perl
ijpeg
li
compress
gcc
m88ksim
0
go
25
Outline
 Motivation
 DIXIE Architecture
 Debugging Tools
 Performance
 Summary
Summary
 Binary translation & optimization
 Are becoming important tools in the embedded market
 Promise lower development costs


When changing architectures
Are also of interest to major computer manufacturers
IA-64 emulation
 Transmeta
 FX!32 (now obsolete)

 DIXIE
 Robust tool that meets most translation demands
 Multi-ISA, Multi-platform
Download