DIXIE Binary Translation and Optimization for Multiple ISAs Computer Architecture Department Universitat Politècnica de Catalunya-Barcelona www.ac.upc.es/dixie UPC people involved Roger Espasa Agustín Fernández Manel Fernández Victor Moya Juan Lopez Silvia Cernuda Antonio Parada Albert Ribé Álex Ramírez Dixie Static binary translator Accepts multiple ISAs (Alpha, x86, PPC, Mips, Convex) Translates to a common IR (Dixie ISA) Static binary instrumentation Works on common IR but reflects source ISA Static binary optimizer Optimizes the common IR Generates native code from common IR Multiple targets supported also (Alpha, Mips) Dixie Virtual Machine Can run binaries specified in the common IR Also runs binaries with mixture of common/native code Dixie overview Alpha Convex JANGO Alpha Convex PowerPC PowerPC x86 x86 Mips Target binaries Mips D I X I E C Dixie binary Target ISAs DVM (Dixie Virtual Machine) User specification S P E E D Y Alpha Mips ... Native ISAs User simulator Outline Motivation DIXIE Architecture Debugging Tools Performance Summary Outline Motivation DIXIE Architecture Debugging Tools Performance Summary Binary Translation For embedded processors Embedded market is Rapidly moving Changes processors frequently Software (development, porting) is a major cost issue Binary translation is cheaper than retargeting gcc Goals Retargeting must be FAST and EASY Support different ISAs Provide good debugging tools To ease writing ISA description To verify correctness of translations Techniques Static Translation (as much as possible) Some Dynamic Translation (only if necessary) Binary Optimization Inevitably, binary translation introduces overheads Use static and dynamic optimization to Adapt better to new chip Offset overheads of static binary translation Goals Eliminate overheads due to Manual translation process Intermediate ISA lack of expressiveness Incremental development of the optimizer Techniques Static optimization (as much as possible) Dynamic optimization (only if necessary) Optimized blocks still run within Virtual Machine Instrumentation Instrumentation of program binaries For computer architecture research Due to lack of access to ‘exotic’ machines Historical origin of Dixie… Many classes of tools, but... Different tools for different machines Porting tools is difficult Few tools allow research on vector machines or new ISAs Lack of wrong-path information Dixie goals Cross-platform instrumentation Research on multiple & discontinued ISAs Full architecture coverage Wrong-path information Outline Motivation DIXIE Architecture Debugging Tools Performance Summary Dixie overview Alpha Convex JANGO Alpha Convex PowerPC PowerPC x86 x86 Mips Target binaries Mips D I X I E C Dixie binary Target ISAs DVM (Dixie Virtual Machine) User specification S P E E D Y Alpha Mips ... Native ISAs User simulator Dixie compiler Alpha Convex JANGO Alpha Convex PowerPC PowerPC x86 x86 Mips Target binaries Mips D I X I E C Dixie binary Target ISAs DVM (Dixie Virtual Machine) User specification S P E E D Y Alpha Mips ... Native ISAs User simulator Jango Alpha Convex JANGO Alpha Convex PowerPC PowerPC x86 x86 Mips Target binaries Mips D I X I E C Dixie binary Target ISAs DVM (Dixie Virtual Machine) User specification S P E E D Y Alpha Mips ... Native ISAs User simulator Breakpoints: trace mov a0,a1 MOV.lo.32 r11,r10 MOV.lo.32 r11,r10 ld.w @8(a1),a2 LOAD.lo.32 r500,r11,#8 TRACE vpc,r11,#8 sub.w #8,a2 LOAD.lo.32 r12,r500,#0 LOAD.lo.32 r500,r11,#8 SUB.c2.32 r12,r12,#8 TRACE vpc,r500,#0 LOAD.lo.32 r12,r500,#0 SUB.c2.32 r12,r12,#8 DIXIEC JANGO Speedy & DVM Alpha Convex JANGO Alpha Convex PowerPC PowerPC x86 x86 Mips Target binaries Mips D I X I E C Dixie binary Target ISAs DVM (Dixie Virtual Machine) User specification S P E E D Y Alpha Mips ... Native ISAs User simulator Speedy & DVM Dixie binary is optimized by Speedy Optimizations at basic block (BB) level Translate Dixie BBs into native code Generates .speedy sections Dixie binary is runable on top of the DVM Emulates the behavior of each Dixie instruction Interpreting each Dixie instruction Jumping into sequences of “Speedy” BBs Interacts with the user simulator Through trace instructions inserted by Jango Maps target system calls into host system calls Through DixOS DVM Portability DVM runs on all major hardware combinations: Little Endian Big Endian Power2 / AIX 32 bits x86 / LINUX Sparc/SUNOS Alpha / OSF1 64 bits IA64/LINUX MIPS / IRIX Speedy Architecture Front End: Understands Dixie ISA Optimizes Dixie Code (NOP, VPC, CSE) Lowers Representation Load Virtual Registers into physical registers Local register allocation Load large constants into registers Back End: Translates Dixie ISA into target ISA Instruction translation Opcode selection Big/Little endian memory access Alignment issues Peephole Optimizer Recognize instruction sequences Remove redundant loads Remove redundant branches Outline Motivation DIXIE Architecture Debugging Tools Performance Summary Debugging Porting to a new ISA is not easy Many “cut-and-paste” bugs A trivial bug may take weeks to be found without appropriate tools andiu. ra, rs, ui MOV.lo.32 r(TMP0),ui SHL.lo.32 r(TMP0),r(TMP0),32 AND.lo.32 (ra),r(rs),r(TMP0) CMPLT.c2.32 r(ICR(POSCRI(0))),r(ra),0 CMPGT.c2.32 r(ICR(POSCRI(1))),r(ra),0 CMPEQ.lo.32 r(ICR(POSCRI(2))),r(ra),0 AND.lo.32 r(TMP0),r(XER),0x80000000 CMPNE.lo.32 r(ICR(POSCRI(3))),r(TMP0),0 We would like developers to “Test-as-you-go’’ every instruction description Test each instruction almost in isolation Quickly compare DVM and native results Outline Motivation DIXIE Architecture Debugging Tools Performance Summary Performance Benchmark suite SPECint95 Environment DEC Alpha AXP-21264 running at 625 MHz OSF/1 v4.0 Two versions of the Dixie binaries DVM: “pure” Dixie binaries Speedy: Dixie binaries optimized using Speedy DVM slowdown 150 DVM Speedy 125 100 75 50 Alpha on Alpha vortex perl ijpeg li compress gcc m88ksim 0 go 25 Outline Motivation DIXIE Architecture Debugging Tools Performance Summary Summary Binary translation & optimization Are becoming important tools in the embedded market Promise lower development costs When changing architectures Are also of interest to major computer manufacturers IA-64 emulation Transmeta FX!32 (now obsolete) DIXIE Robust tool that meets most translation demands Multi-ISA, Multi-platform