Designing Programmable Platforms: From ASIC to ASIP MPSoC 2005 Heinrich Meyr CoWare Inc., San Jose and Integrated Signal Processing Systems (ISS), Aachen University of Technology, Germany Agenda Facts & Conclusions Heterogeneous MPSoC » Energy Efficiency vs.Flexibility » How to explore the Design Space? ASIP Design Agenda Economics of SoC Development Conclusions Facts & Conclusion Core Proposition ASIP based Platforms (heterogenousMPSoC) Agenda Facts & Conclusions Heterogeneous MPSoC » Energy Efficiency vs.Flexibility » How to explore the Design Space? ASIP Design Agenda Economics of SoC Development Conclusions Heterogeneous MPSoC Trade-off between Flexibility and Energy -Efficiency Architectural Objectives Need more MOPS/Watt and MOPS/mm² to minimize the global performance measure for battery driven devices Energy / decoded Bit = (Joule/Bit) Computational Effiency vs. Flexibility Source: T.Noll, RWTH Aachen Enabling MP-SoC Design System Level Tools I: Application & IP Creation algorithm domain algorithmic exploration • Matlab • SPW • System Studio system application design Architecture Description Language block specification • LISATek Processor Synthesis • ConvergenSC Buscompiler High-level IP block design micro architecture domain block implementation • RTL Synthesis System Level Tools I: Application & IP Creation algorithm domain SystemC Transaction Level Modeling Architecture Description Language micro architecture domain algorithmic exploration abstract architecture • Matlab System • SPW •System Studio application design • MP-SoC Intermediate Representation MP-SoC platform design System Level Tools II: MP-SoC Platform Design virtual prototype • ConvergenSC Platform Creator • ConvergenSC Platform Creator block specification • LISATek Processor Synthesis • ConvergenSC Buscompiler High-level IP block design block implementation • RTL Synthesis Agenda Facts & Conclusions Heterogeneous MPSoC » Energy Efficiency vs.Flexibility » How to explore the Design Space? ASIP Design Agenda Economics of SoC Development Conclusions Processor Design Space Micro Architecture Design Instruction Set Design butterfly 0 butterfly 1 load/store • Exploit- regularity/parallelism in Instruction-Set Design Instruction-Set Design data flow/data storage - Compiler Design Compiler Design • VLIW,- SIMD, ? • Which instructions for compiler support? • Instruction Encoding? • How much general purpose registers? FE FE DC DC EX EX WB WB -Micro -MicroArchitecture ArchitectureDesign Design • Pipeline length ? MESCAL 2: • Shared resources ? • Bypass ? • Parallel execution units ? Inclusively identify the Optimal Optimaldesign designrequires requirespowerful powerfultools tools architectural space and RTL Design which cache required ? andautomation automation!!Soc Integration • Area constraints met? • Clock frequency? -RTL Design -RTL Design --RTL RTLISS ISSCo-verification Co-verification Core MMU Cache -System -SystemIntegration Integration bus fast enough? --Embedded EmbeddedSoftware Software Simulation communication? Simulation Memory Peripheral Architecture Description Language based Processor Design The purpose of an architecture description language (e.g LISA) is: » To allow for an iterative design to efficiently explore architecture alternatives MESCAL–Compiler” 3: » To jointly design “Architecture and on chip communication Efficiently describe » To automatically generate hardware (path to the ASIP implementation) » To automatically generate tools » Assembler ,Linker, Compiler, Simulator, co-simulation interfaces From a single model at various level of temporal and spatial abstraction LISA 2.0 - Abstraction Levels architecture very detailed phase accurate model + IRQ, etc. Functional units, Registers, Memories Pseudo Resources (e.g. cc-variables) instruction accurate model ac cu ra cy cycle accurate model + Pipelines no details high level model Pseudo Processor Cycles Instructions Instructions time Phases Empty Model Describe/Adopt Processor Model Generate Tools Application RISC Sample VLIW Sample DSP Sample Custom Processor Model (LISA 2.0 language) LISATek Software MESCAL 3: Software Tool Tool FFT Processor Chain Chain Processor Efficiently describe Designer LISATek IP and evaluate the Samples Rapid -targetable simulation -generation allows Rapidmodeling modelingand andre re-targetable simulation++code code-generation allowsfor: for: ASIP joint optimization of application and architecture joint optimization of application and architecture Generate... RTL RTL MESCAL 5: RTL SoC Executable Integration Kit Software Sucessfully (e.g.:SystemC) ) Platform deploy e.g.:SystemC the ASIP Function and instruction level profiling reveals hot-spots -> special purpose instructions •Instruction Set Synthesis •Memory architecture •Verification E X P L O R A T I O N Current Work I M LISA Description P LISA Compiler C-Compiler L HDL Generator E Optimization Assembler M Linker E SystemC, VHDL, Verilog Output Simulator N T Model Verification Gate Level Synthesis A & Evaluation T Evaluation Results Evaluation Results I Chip Area, Clock Speed, Profile Information, O Power Consumption3: Application Performance MESCAL N …..evaluate the ASIP Target Architecture June 10, 2004 A Novel Approach for Flexible and Consistent ADL-driven ASIP Design Gunnar Braun Achim Nohl CoWare, Inc DAC Booth #1844 www.CoWare.com Weihua Sheng, Jianjiang Ceng, Manuel Hohenauer, Hanno Scharwächter, Rainer Leupers, Heinrich Meyr Integrated Signal Processing Systems (ISS) Aachen University of Technology Germany Introduction Architecture Description Languages (ADL) • Automatic generation of Software Toolkit (Compiler, Assembler, Linker, IS-Simulator) • Architecture Exploration • SystemC models, RTL code, verification tools, ... Challenges: • Different tools need different information • Unambiguous, redundancy-free architecture model (rather than tools description) • Multiple abstraction levels (instruction-accurate and/or cycle-accurate) Tool Requirements: Compiler aa == bb ++ c; c; C mul rd = rs, rt rs rt ld rd = @ @ * LD rd rd add rd = rs, rt st @ = rs C C Compiler Compiler rs add add cc == a, a, bb Assembly rt rs + ST rd @ Tool Requirements: Simulator Machine Code add add r5 r5 == r2, r2, r1 r1 mul rd = rs, rt MUL_read (rs, rt); MUL_add (); Update_flags (); writeback (rd); ld rd = @ LSU_addrgen(); data_bus.req(); data_bus.read(); writeback (rd); Simulator Simulator add rd = rs, rt ALU_read (rs, rt); ALU_add (); Update_flags (); writeback (rd); st @ = rs LSU_addrgen(); LSU_read(rs); data_bus.req(); data_bus.write(rs); Simulation Code (C) ALU_read ALU_read(r2, (r2,r1); r1); ALU_add ALU_add(); (); Update_flags Update_flags(); (); writeback writeback(r5); (r5); ADL Model aa == bb ++ c; c; ADL Model add add r5 r5 == r2, r2, r1 r1 SYNTAX {{ SYNTAX “ADD“ “ADD“ dst, dst, src1, src1, src2 src2 }} add rd = rs, rt CODING {{ CODING ALU_read (rs, rt); 0b0010 dst 0b0010 dst src1 src1 src2 src2 ALU_add (); }} Update_flags (); writeback (rd); C C Compiler Compiler BEHAVIOR BEHAVIOR {{ ALU_read (src1, src2); ALU_read (src1, src2); rs rt ALU_add ALU_add (); (); Update_flags Update_flags (); + (dst);(); writeback writeback (dst); }} Simulator Simulator rd SEMANTICS SEMANTICS {{ src1 src1 ++ src2 src2 Æ Æ dst; dst; }} add add cc == a, a, bb ALU_read ALU_read(r2, (r2,r1); r1); ALU_add ALU_add(); (); Update_flags Update_flags(); (); writeback writeback(r5); (r5); Problem Statement • Compiler and Simulator need different information: • Compiler: C operation to instruction(s) WHAT is the instruction good for? Purpose? • Simulator: instructions to sequence of operations HOW is the instruction executed? What actions to perform? • Architecture Designer‘s Perspective: ? ? ? src1 src1 ++ src2 src2 Æ Æ dst; dst; ALU_read ALU_read (src1, (src1, src2); src2); ALU_add (); ALU_add (); Update_flags Update_flags (); (); write back (dst); writeback (dst); Examples ASDSP FPGA Implementation Myjung Sunwoo, Ajiou University, ASDSP Core Design 9SEC 0.18um Synthesis • Gate : 77,000 • Program Memory : 4 Kbyte, Data Memory : 8 Kbyte • Frequency : 290MHz • Power consumption : 0.87W (3mW/MHz) FPGA Implementation 9 iProve Xilinx xc2v6000 Support the Special Instruction Set for FFT Operation and the BMU Instruction Improve the Performance for OFDM Communication The ICORE A low-power ASIP for Infineon DVB-T 2nd generation Single-Chip Receiver: • ASIP for DVB-T acquisition and tracking algorithms (sampling-clock-synchronization, interpolation / decimation, carrier frequency offset estimation) • Harvard Architecture • 60 mostly RISC-like Instructions & Special Instructions for CORDIC-Algorithm • 8x32-Bit General Purpose Registers, 4x9-Bit Address Registers • 2048x20-Bit Instruction ROM, 512x32-Bit Data Memory • I2C Registers and dedicated interfaces for external communication The Motorola M68HC11 Architecture The Motorola M68HC11 Architecture Increasing SW Content- but How? Architecture Overview M68HC11 CPU Architecture : Hot spots » 8-bit micro-controller. » Harvard Architecture » » » » » » » » 7 CPU Registers. 6 different Addressing Modes. Shared data and program bus. : stalled data access Instruction width : 8,16, 24, 32, 40 : multi-cycle fetch 8-bit opcode : 181 instructions Clock speed : ~200 MHz Performance : : non-pipelined Area : 15K to 30K (DesignWare® Library) Architecture Development with LISA FE 16 32 32 DC 32 32 EX ++ pipelined pipelined architecture architecture ++ separate program and data bus separate program and data bus 16 0x0000 512Bytes int. RAM ACCU Accu A 64Bytes Conf. Reg. Accu B Index X Index Y 3.5K ext. RAM Stack Pointer 61K ext. RAM 0x10000 Condition Results •Area < 23k gates •Clock speed ~ 200 MHz •Execution time speed up 62 % for spanning tree application •Mapped onto Xilinx FPGA Architecture Development with LISA •Studying the architecture 4 days •Basic architecture modifications 2 days •Grouping and coding of the instructions 1 day •Writing the LISA model -basic syntax and coding 4 days -behavior section 6 days •Validation 4 days •HDL Generation Total 2 days 23 days Design of Application Specific Processor Architectures Rainer Leupers RWTH Aachen University Software for Systems on Silicon leupers@iss.rwth-aachen.de Institute for Integrated Signal Processing Systems Overview 1. Introduction 2. ASIP design methodologies 3. Software tools 4. ASIP architecture design 5. Case study 6. Advanced research topics 2005 © R. Leupers 4 1. Introduction 2005 © R. Leupers 5 Embedded system design automation ¾ Embedded systems Special-purpose electronic devices Very different from desktop computers ¾ Strength of European IT market Telecom, consumer, automotive, medical, ... Siemens, Nokia, Bosch, Infineon, ... ¾ New design requirements Low NRE cost, high efficiency requirements Real-time operation, dependability Keep pace with Moore´s Law 2005 © R. Leupers 6 What to do with chip area ? 2005 © R. Leupers 7 Example: wireless multimedia terminals ¾ Multistandard radio UMTS GSM/GPRS/EDGE WLAN Bluetooth UWB … ¾ Multimedia standards MPEG-4 MP3 AAC GPS DVB-H … 2005 © R. Leupers Keyissues: issues: Key Timeto tomarket market(≤ (≤12 12months) months) ••Time Flexibility(ongoing (ongoingstandard standard ••Flexibility updates) updates) Efficiency(battery (batteryoperation) operation) ••Efficiency 8 Application specific processors (ASIPs) „As the performance of conventional microprocessors improves, they first meet and then exceed the requirements of most computing applications. Initially, performance is key. But eventually, other factors, like customization, become more important to the customer...“ [M.J. Bass, C.M. Christensen: The Future of the Microprocessor Business, IEEE Spectrum 2002] design budget = (semiconductor revenue) × (% for R&D) growth ≈ 15% ≈ 10% # IC designs = (design budget) / (design cost per IC) growth ≈ 15% growth ≈ 50-100% [Keutzer05] → Customizable application specific processors as reusable, programmable platforms 2005 © R. Leupers 9 Digital Signal Processors Application Specific Instruction Set Processors Field Programmable Devices SW Why use ASIPs? Design • Higher efficiency for given range of applications • IP protection Design • Cost reduction (no HW royalties) • Product differentiation Application Specific ICs Physically Optimized ICs 105 . . . 106 Log F L E X I B I L I T Y General Purpose Processors Log P O W E R D I S S I P A T I O N Efficiency and flexibility Log P E R F O R M A N C E 103 . . . 104 Source: T.Noll, RWTH Aachen 2005 © R. Leupers 10 2. ASIP design methodologies 2005 © R. Leupers 12 ASIP architecture exploration Application initial processor architecture optimized processor architecture 2005 © R. Leupers Profiler Compiler Simulator Assembler Linker 13 Expression (UC Irvine) 2005 © R. Leupers 14 Tensilica Xtensa/XPRES Source: Tensilica Inc. 2005 © R. Leupers 15 MIPS CorXtend/CoWare CorXpert H o t s p o t Profile and identify custom instructions Replace critical code with special instruction 2 User Defined Instruction + Synthesize HW and profile with MIPSsim and extensions 3 CorExtend Module 1 User Defined Instruction 2005 © R. Leupers 16 CoWare LISATek ASIP architecture exploration ¾ Integrated embedded processor development environment ¾ Unified processor model in LISA 2.0 architecture description language (ADL) ¾ Automatic generation of: SW tools HW models 2005 © R. Leupers 17 LISA operation hierarchy main Reflects hierarchical organization of ISAs decode addr imm linear 2005 © R. Leupers cond opcode opnds cycl control arithm move short add sub mul and or long 18 LISA operations structure LISA operation References to other operations DECLARE Binary coding CODING Assembly syntax SYNTAX Resource access, e.g. registers EXPRESSION Initiate “downstream” operations in pipe ACTIVATION BEHAVIOR Computation and processor state update C compiler generation SEMANTICS 2005 © R. Leupers 19 LISA operation example OPERATION ADD { DECLARE { GROUP src1, src2, dest = { Register } } CODING { 0b1011 src1 src2 dest } SYNTAX { “ADD” dest “,” src1 “,” src2 } BEHAVIOR { dest = src1 + src2; } } OPERATION Register { DECLARE { LABEL } C/C++ Code ADD index; CODING { index } src1 src2 dest SYNTAX { “R” index } EXPRESSION{ R[index] } Register } 2005 © R. Leupers Register Register 20 Exploration/debugger GUI ••Application Applicationsimulation simulation ••Debugging Debugging ••Profiling Profiling ••Resource Resourceutilization utilizationanalysis analysis ••Pipeline Pipelineanalysis analysis ••Processor Processormodel modeldebugging debugging ••Memory Memoryhierarchy hierarchyexploration exploration ••Code Codecoverage coverageanalysis analysis ••... ... 2005 © R. Leupers 21 Some available LISA 2.0 models ¾ DSP: • VLIW: Texas Instruments TMS320C54x Analog Devices ADSP21xx Motorola 56000 ¾ RISC: MIPS32 4K ESA LEON SPARC 8 ARM7100 ARM926 2005 © R. Leupers – Texas Instruments TMS320C6x – STMicroelectronics ST220 • µC: – MHS80C51 • ASIP: – Infineon PP32 NPU – Infineon ICore – MorphICs DSP 22 3. Software tools 2005 © R. Leupers 23 Tools generated from processor ADL model Application Profiler Compiler Simulator Assembler Linker 2005 © R. Leupers 24 Instruction set simulation Interpretive: • flexible • slow (~ 100 KIPS) Compiled: • fast (> 10 MIPS) • inflexible • high memory consumption JIT-CCS™: • „just-in-time“ compiled • SW simulation cache • fast and flexible 2005 © R. Leupers Decode Instruction Application Execute Memory Run-Time Run-Time Instruction Behavior Application Simulation Compiler Compile-Time Compile-Time Program Memory Program Memory Run-Time Run-Time Instruction Application Execute Compiled Simulation Instruction Behavior Decode Compiled Simulation Cache Execute Run-Time Run-Time 25 JIT-CC simulation performance 9 100 8 90 7 80 70 6 60 5 50 4 40 3 30 C 2005 © R. Leupers 6 25 12 51 2 10 24 20 48 40 96 81 9 16 2 38 32 4 76 8 0 8 0 64 10 32 1 8 16 20 o In m p te ile rp d re ti v e 2 Cache Miss Ratio [%] Performance [MIPS] • Dependent on simulation cache size • 95% of compiled simulation performance @ 4096 cache blocks (10% memory consumption of compiled sim.) • Example: ST200 VLIW DSP Cache size [records] 26 Why care about C compilers? ¾ Embedded SW design becoming predominant manpower factor in system design ¾ Cannot develop/maintain millions of code lines in assembly language ¾ Move to high-level programming languages 2005 © R. Leupers 27 Why care about compilers? ¾ Trend towards heterogeneous multiprocessor systems-onchip (MPSoC) ¾ Customized application specific instruction set processors (ASIPs) are key MPSoC components ¾ How to achieve efficient compiler support for ASIPs? ASIC ASIC ASIC ASIC CPU CPU ASIP ASIP ASIP ASIP CPU CPU ASIP ASIP Memory Memory Memory Memory Memory Memory CPU CPU Mem Mem 2005 © R. Leupers 28 C compiler in the exploration loop „Compiler/Architecture Co-Design“ ¾ Efficient C-compilers cannot be designed for ARBITRARY architectures! Application Software Compiler Processor Results ¾ Compiler and processor form a UNIT that needs to be optimized! ¾ “Compiler-friendliness“ needs to be taken into account during the architecture exploration! 2005 © R. Leupers 29 Retargetable compilers Classical compiler source code Compiler Retargetable compiler source code processor model processor model Compiler Compiler asm code asm code 2005 © R. Leupers 30 GNU C compiler (gcc) • Probably the most widespread retargetable compiler • Mostly used as a native Unix/Linux compiler, but may operate as a cross-compiler, too • Support for C/C++, Java, and other languages • Comes with comprehensive support software, e.g. runtime and standard libraries, debug support • Portable to new architectures by means of machine description file and C support routines “The “Themain maingoal goalof ofGCC GCCwas wasto tomake makeaagood, good,fast fastcompiler compilerfor for machines machinesin inthe theclass classthat thatthe theGNU GNUsystem systemaims aimsto torun runon: on:32-bit 32-bit machines machinesthat thataddress address8-bit 8-bitbytes bytesand andhave haveseveral severalgeneral generalregisters. registers. Elegance, Elegance,theoretical theoreticalpower powerand andsimplicity simplicityare areonly onlysecondary.” secondary.” 2005 © R. Leupers 31 CoSy compiler system (ACE) • Universal retargetable C/C++ compiler • Extensible intermediate representation (IR) • Modular compiler organization • Generator (BEG) for code selector, register allocator, scheduler 2005 © R. Leupers © ACE - Associated Compiler Experts 34 LISATek C compiler generation LISA processor model SYNTAX {{ SYNTAX “ADD“ “ADD“ dst, dst, src1, src1, src2 src2 }} CODING {{ CODING 0b0010 dst 0b0010 dst src1 src1 src2 src2 }} BEHAVIOR BEHAVIOR {{ ALU_read ALU_read (src1, (src1, src2); src2); ALU_add ALU_add (); (); Update_flags Update_flags (); (); writeback writeback (dst); (dst); }} SEMANTICS SEMANTICS {{ src1 src1 ++ src2 src2 Æ Æ dst; dst; }} … … GUI Autom. analyses Manual refinement CoSysystem system CoSy Compiler CCCompiler 2005 © R. Leupers 36 LISATek compiler generation C-Code ASM-Code int a,b,c; a = b+1; c = a<<3; … LD R1, [R2] ADD R1, #1 SHL R1, #3 … CodeFrontend RegisterOpt Selector Allocator FE DE InstructionFetch ADD SUB JMP Backend Scheduler ADD … ALU …#1 SUB MUL JMP 2 1 ADD 2 3Mem Jump Decoder Pipeline Control EX WB WriteBack Decoder Registers R[0..31] …R[i] … 2005 © R. Leupers Prog Data RAM RAM 37 Compiled code quality: MIPS example ¾LISATek ¾LISATekgenerated generatedC-Compiler C-Compiler ¾Out-of-the-box ¾Out-of-the-boxC-Compiler C-Compiler ¾No ¾Nomanual manualoptimizations optimizations ¾Development ¾Developmenttime timeof ofmodel model approx. approx.22weeks weeks gcc gccC-Compiler C-Compiler ¾gcc ¾gccwith withMIPS32 MIPS324kc 4kcbackend backend ¾Used ¾Usedby bymost mostMIPS MIPSusers users ¾Large ¾Largegroup groupof ofdevelopers, developers, several severalman-years man-yearsof ofoptimization optimization Size Cycles 140.000.000 80.000 120.000.000 70.000 60.000 100.000.000 Overhead Overheadof of10% 10%in incycle cyclecount countand and17% 17%in incode codedensity density 50.000 80.000.000 Cycles Size 40.000 60.000.000 30.000 40.000.000 20.000 20.000.000 10.000 0 0 gcc,-O4 gcc,-O2 2005 © R. Leupers cosy,-O4 cosy,-O2 gcc,-O4 gcc,-O2 cosy,-O4 cosy,-O2 38 Demands on code quality ¾ Compilers for embedded processors have to generate extremely efficient code Code size: » system-on-chip » on-chip RAM/ROM Performance: » real-time constraints Power/energy consumption: » heat dissipation » battery lifetime 2005 © R. Leupers 39 Compiler flexibility/code quality trade-off variety of embedded processors unification retargetable compilation dedicated optimization techniques specialization DSP 2005 © R. Leupers NPU VLIW 40 Adding processor-specific code optimizations ¾ High-level (compiler IR) Enabled by CoSy´s engine concept ¾ Low-level (ASM): LISA LISACC Compiler Compiler .C .C Unscheduled Unscheduled .asm .asm Assembly API Scheduled Scheduled&& Optimized Optimized .asm .asm Optimization 11 Optimization Optimization 22 Optimization Optimization Optimization33 Binary Code Generation Assembler Assembler 2005 © R. Leupers Linker Linker .out 41 4. ASIP architecture design 2005 © R. Leupers 47 ASIP implementation after exploration 2005 © R. Leupers 48 Unified Description Layer LISA HDL Generation Register-Transfer-Level Gate–Level Synthesis (e.g. SYNOPSYS design compiler) Gate–Level 2005 © R. Leupers 49 Challenges in Automated ASIP Implementation ADL: HDL: Instructions Arithmetic Mul Multiplier (MUL) Control JMP Mac 1:1 Mapping BRC Multiplier (MAC) Independent description of instruction behavior: Independent mapping to hardware blocks: + Efficient Design Space Exploration - Insufficient architectural efficiency by 1:1 mapping 2005 © R. Leupers 50 Unified Description Layer LISA Structure & Mapping (incl. JTAG/DEBUG) Optimizations Unified Description Layer Backend (VHDL, Verilog, SystemC) Register-Transfer-Level Gate–Level Synthesis (e.g. SYNOPSYS design compiler) Gate–Level 2005 © R. Leupers 51 Optimization strategies LISA: separate descriptions for separate instructions Goal: share hardware for separate instructions Instruction A Instruction B LISA Operation A LISA Operation B Mutual Exclusiveness a Possible Optimizations • ALU Sharing b c d + + x y a c b d + x,y 2005 © R. Leupers 52 Optimization strategies LISA: separate descriptions for separate instructions Goal: same hardware for separate instructions Instruction A Instruction B LISA Operation A LISA Operation B Mutual Exclusiveness … • Path Sharing Resource Sharing Register Array … • ALU Sharing DataA … Possible Optimizations Path PA AddressA Path PB DataB AddressB • ... … Register Array 2005 © R. Leupers DataA, DataB AddressA AddressB 53 5. Case study 2005 © R. Leupers 54 Motorola 6811 Project Goals: • Performance (MIPS) must be increased • Compatibility on the assembly level for reuse of legacy code (Integration into existing tool flow) • Royalty free design Î compatible architecture developed with LISA using RTL processor synthesis 2005 © R. Leupers 55 Motorola 6811 legacy code compiler assembly assembler 0100101010011010 1110010110101111 0000110110110100 assee IInnccrreea nccee!!!! rm maan PPeerrffoor IPPSS)) ((M MI ? 6812 6811 2005 © R. Leupers 56 Motorola 6811 Bluetooth app. 6811 compiler assembly assembly level compatible assembler 0100101010011010 1110010110101111 0000110110110100 LISA Synthesized Architecture 2005 © R. Leupers 57 Architecture Development original 6811 Processor LISA 6811 Processor 8 bit instructions 16 bit instructions 16 bit instructions 32 bit instructions 24 bit instructions 32 bit instructions 40 bit instructions Instruction Instruction is is fetched fetched by by 88 bit bit blocks: blocks: 16 16 bit bit are are fetched fetched simultaneously: simultaneously: Î Î up up to to 55 cycles cycles for for fetching! fetching! Î Î max max 22 cycles cycles for for fetching! fetching! ++ pipelined pipelined architecture architecture ++ possibility possibility for for special special instructions instructions 2005 © R. Leupers 58 Tools Flow and RTL Processor Synthesis C-Application 6811 compiler LISA tools Assembly LISA model LISA assembler Executable 2005 © R. Leupers 6811 compatible architecture generated completely in VHDL 1) VLSI Implementation: Area: <17kGates Clock Speed: ~154 MHz 2) Mapped onto XILINX FPGA 59 References ¾ R. Leupers: Code Optimization Techniques for Embedded Processors - Methods, Algorithms, and Tools, Kluwer, 2000 ¾ R. Leupers, P. Marwedel: Retargetable Compiler Technology for Embedded Systems - Tools and Applications, Kluwer, 2001 ¾ A. Hoffmann, H. Meyr, R. Leupers: Architecture Exploration for Embedded Processors with LISA, Kluwer, 2002 ¾ C. Rowen, S. Leibson: Engineering the Complex SoC: Fast, Flexible Design with Configurable Processors, Prentice Hall, 2004 ¾ M. Gries, K. Keutzer, et al.: Building ASIPs: The Mescal Methodology, Springer, 2005 ¾ P. Ienne, R. Leupers (eds.): Customizable and Configurable Embedded Processor Cores, Morgan Kaufmann, to appear 2006 2005 © R. Leupers 75