(Haupt)‐Seminar: ADAMS Synthesis of RTL Descriptions for Synthesis of RTL Descriptions for Architecture Exploration Yazan Boshmaf INFOTECH (ESE) “The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient t ti li d t i ffi i t operation will magnify the inefficiency.” ‐Bill Gates 2 1. Introduction Agenda – Current Market Trends & Adaptation – Design Automation & EDAs – Research Area 2. Full System‐Level Design Flow – Architecture Exploration / Tradeoffs – Architecture Design 3. Architecture Exploration at High Abstraction Levels h l h b l – Architecture Description Language (ADL) – ADL for Design Space Exploration 4 RTL Synthesis for Architecture Exploration 4. RTL S th i f A hit t E l ti – Proposed Approaches/Frameworks – Improvement 5 Asynchronous Design Support 5. Asynchronous Design Support – Asynchronous Logic – Transformation Framework – RTL Synthesis for Architecture Exploration: Glue It RTL Synthesis for Architecture Exploration: Glue It 6. Summary & References 3 Current Market Trends Current Market Trends • The demand for high performance computing and the competition between IC vendors is increasing. competition between IC vendors is increasing. • The time elapsed before strarting to market a product (Time to Market) has a major impact on profit. • Current trend: demand for high‐performance, C d d d f hi h f multifunctional, & low power consuming electronics that should be satisfied in a short time. – Example: Mobile phones • Smaller is better (Miniaturization) • Multifunction: camera, phone, mp3 player, organizer, & storage. • Long battery life ÎThis resulted in conflicting design aims: – High performance and complex designs (needs time) g p p g ( ) – Minimum design time 4 Y2K+ Productivity Challenge Y2K+ Productivity Challenge Figure 1. Design productivity challenge Figure 1. Design productivity challenge 5 Market Adaptation Market Adaptation • „ Survival for the fittest Survival for the fittest“ rule always rule always applies! applies! The sooner you fit you product in the market, the more profit you‘ll get. But how? – New approaches have to be investigated to automate the design cycle of embedded processors (High performance & complex). (Hi h f & l ) – New tools have to be developed to shorten the overall design time resulting in shorter time to overall design time, resulting in shorter time to market (Minimum design time). Î Electronic Design Automation (EDA) tools g ( ) 6 Design Automation & EDAs Design Automation & EDAs • Electronic Design Automation (EDA) is the g ( ) category of tools for automated design and production of electronic systems ranging from printed circuit boards (PCBs) to integrated circuits printed circuit boards (PCBs) to integrated circuits (ICs). • EDA tools are an implementation of design automation methods researched by academics, vendors, & industry. • Design automation methodologies & EDA tool Design automation methodologies & EDA tool development are the focus of Design Automation Conference (DAC). Please check http://www.dac.com for the latest conference date 7 EDA Industry Growth EDA Industry Growth • EDA consortium reports 15% industry revenue growth for 2006 growth for 2006. • Significant increase in DAC participation from all fields all fields. Figure 2. Paper participation in DAC [Alb03] 8 Research Area Research Area Figure 3. System‐level Design Flow‐ EDA Centric 9 1. Introduction Agenda – Current Market Trends & Adaptation – Design Automation & EDAs – Research Area 2. Full System‐Level Design Flow – Architecture Exploration / Tradeoffs – Architecture Design 3. Architecture Exploration at High Abstraction Levels h l h b l – Architecture Description Language (ADL) – ADL for Design Space Exploration 4 RTL Synthesis for Architecture Exploration 4. RTL S th i f A hit t E l ti – Proposed Approaches/Frameworks – Improvement 5 Asynchronous Design Support 5. Asynchronous Design Support – Asynchronous Logic – Transformation Framework – RTL Synthesis for Architecture Exploration: Glue It RTL Synthesis for Architecture Exploration: Glue It 6. Summary & References 10 Full System‐Level Full System Level Design Flow Design Flow System Specification Application + Requirements Architecture Exploration ‐HW/SW partitioning and mapping ‐Global Global communication structure communication structure ‐Performance testing Golden model Abstract Architecture Architecture Design ‐Component design (Adder) ‐HW/SW interface design (Bus) µarchitecture RTL Architecture Figure 4. System‐level Design Flow [Fir03] 11 Architecture Exploration: Tradoffs Architecture Exploration: Tradoffs High performance CLA adder VS. Chip area Chip area Figure 5. Design performance and parameters Figure 5 Design performance and parameters relationship [Pra98]. Point II is the optimum 12 1. Introduction Agenda – Current Market Trends & Adaptation – Design Automation & EDAs – Research Area 2. Full System‐Level Design Flow – Architecture Exploration / Tradeoffs – Architecture Design 3. Architecture Exploration at High Abstraction Levels h l h b l – Architecture Description Language (ADL) – ADL for Design Space Exploration 4 RTL Synthesis for Architecture Exploration 4. RTL S th i f A hit t E l ti – Proposed Approaches/Frameworks – Improvement 5 Asynchronous Design Support 5. Asynchronous Design Support – Asynchronous Logic – Transformation Framework – RTL Synthesis for Architecture Exploration: Glue It RTL Synthesis for Architecture Exploration: Glue It 6. Summary & References 13 Architecture Exploration at High Abstraction Levels b l Figure 6. Standard ASIP Architecture 14 Architecture Description Language ( (ADL) ) • Modeling Modeling has a major role in the realization of has a major role in the realization of design automation of embedded processors. • ADLs are specification languages used at high ADLs are specification languages used at high abstract levels to describe and explore the design architectures. • An ADL must be: – Descriptive p – Simple – For design automation 15 ADL Classification ADL Classification 16 ADL vs. Other Languages ADL vs. Other Languages Figure 6. ADL vs. Non‐ADL Languages [Pra06] Figure 8. ADL‐driven design automation of embedded processor [Pra06] 17 ADL for Design Space Exploration ADL for Design Space Exploration Figure 9. ADL‐driven design space exploration [Aru03] 18 Case Study: Architecture Implementation Using LISA l Figure 10. Exploration and implementation using LISA [Oli02] 19 Resourse { REGISTER int GPR[0..15]; MEMORY int prog_mem { SIZE(1024); } SIZE(1024); } PIPE pipe={FE; DC; EX; WB}; } ... OPERATION decode in pipe.DC { DECLARE { GROUP instr = { add || sub }; } BEHAVIOR { PIPELINE_REGISTER (p, DC/EX).a = R1; PIPELINE REGISTER (p, DC/EX).b = R2; PIPELINE_REGISTER (p, DC/EX).b R2; } ACTIVATION { instr } } OPERATION add in pipe.EX { OPERATION add in pipe.EX { DECLARE { INSTANCE writeback; } BEHAVIOR { PIPELINE_REGISTER (p, EX/WB).r = PIPELINE_REGISTER (p, DC/EX).a + PIPELINE REGISTER (p, DC/EX).b; PIPELINE_REGISTER (p, DC/EX).b; } ACTIVATION { writeback } } OPERATION writeback in pipe.WB { BEHAVIOR { R = PIPELINE_REGISTER (p, EX/WB).r; } } 20 1. Introduction Agenda – Current Market Trends & Adaptation – Design Automation & EDAs – Research Area 2. Full System‐Level Design Flow – Architecture Exploration / Tradeoffs – Architecture Design 3. Architecture Exploration at High Abstraction Levels h l h b l – Architecture Description Language (ADL) – ADL for Design Space Exploration 4 RTL Synthesis for Architecture Exploration 4. RTL S th i f A hit t E l ti – Proposed Approaches/Frameworks – Improvement 5 Asynchronous Design Support 5. Asynchronous Design Support – Asynchronous Logic – Transformation Framework – RTL Synthesis for Architecture Exploration: Glue It RTL Synthesis for Architecture Exploration: Glue It 6. Summary & References 21 RTL Synthesis for Architecture Exploration: A Gap l • Generation Generation of synthesizable RTL description of synthesizable RTL description during architecture exploration phase of system design is not something new to design g g g automation (LISA HDL model). • In practice, the generated RTL code doesn In practice, the generated RTL code doesn’tt reflect design specific physical requirements like p power consumption and area. p ÎThe system designer has to optimize the g generated code and involve custom‐design tasks. g 22 Proposed Approaches/Frameworks Proposed Approaches/Frameworks • Approaches: – Generic parameterized processor cores: it’s limited to specific core templates whose toolkits limited to specific core templates whose toolkits can be extended to a certain degree (Jazz DSPs). – Processor specification languages (ADL): Processor specification languages (ADL): Synthesizable RTL aware ADL approach. • RTL(+) for LISA • RTL (+) for EXPRESSION 23 RTL (+) for LISA: The Framework RTL (+) for LISA: The Framework Figure 11. Enhanced Exploration and Implementation based on LISA [Oli04] 24 LISA Specification Generated HDL/HW Resourse { REGISTER int GPR[0..15]; MEMORY int prog mem { MEMORY int prog_mem { SIZE(1024); } PIPE pipe={FE; DC; EX; WB}; } ... OPERATION writeback in pipe.WB { BEHAVIOR { R = PIPELINE_REGISTER (p, EX/WB).r; } } LISSA Comp piler OPERATION decode in pipe.DC { DECLARE { GROUP instr = { add || sub }; } BEHAVIOR { PIPELINE_REGISTER (p, DC/EX).a = R1; PIPELINE_REGISTER (p, DC/EX).b = R2; } ACTIVATION { instr } } OPERATION dd i i EX { OPERATION add in pipe.EX { DECLARE { INSTANCE writeback; } BEHAVIOR { PIPELINE_REGISTER (p, EX/WB).r = PIPELINE_REGISTER (p, DC/EX).a + PIPELINE REGISTER (p DC/EX) b; PIPELINE_REGISTER (p, DC/EX).b; } ACTIVATION { writeback } } Explicit HW desciption/Pipeline Implicit HW desciption/Controller Non‐Formalized HW desciption /Datapath 25 LISA Specification Generated HDL/HW Resourse { REGISTER int GPR[0..15]; MEMORY int prog_mem { SIZE(1024); } PIPE pipe={FE; DC; EX; WB}; } ... Explicit HW desciption/Pipeline LISSA Comp piler OPERATION decode in pipe.DC { DECLARE { GROUP instr = { add || sub }; } { GROUP instr = { add || sub }; } BEHAVIOR { PIPELINE_REGISTER (p, DC/EX).a = R1; PIPELINE_REGISTER (p, DC/EX).b = R2; } ACTIVATION { instr } } OPERATION add in pipe.EX { DECLARE { INSTANCE writeback; } BEHAVIOR { PIPELINE_REGISTER (p, EX/WB).r = PIPELINE_REGISTER (p, DC/EX).a + PIPELINE_REGISTER (p, DC/EX).b; } ACTIVATION { writeback } } Implicit HW desciption/Controller / OPERATION writeback in pipe.WB { BEHAVIOR { R = PIPELINE REGISTER(p EX/WB) r; } BEHAVIOR { R = PIPELINE_REGISTER(p, EX/WB).r; } } SystemC Customized Datapath //Address of Reg R, 5 for example AW_R_out = &R; EW_R_out = 1; //Valid flag REG_R_out = EX_WB_R_in; Non‐Formalized HW desciption/Datapath from SystemC specification 26 RTL (+) for LISA: Results RTL (+) for LISA: Results • Problem: – Resource sharing between different components. Resource sharing between different components • Solution: – Analyzing Analyzing the ANSI C code of the model concerning the the ANSI C code of the model concerning the underlying hardware and mutual exclusion of operations. • Results: of an implementation of integer pipeline and memory configuration for LEON processor y g p Timing (ns) Gates (KGates) Handwritten LEON 3.08 16.7 Generated LEON 4.42 37.0 Table 1. LEON synthesis results [Oli04] 1.44x Slower 2.22x Larger 27 RTL (+) for EXPRESSION: Why? RTL (+) for EXPRESSION: Why? • Automatic generation of synthesizable HDL g y design and software toolkit from EXPRESSION. • More rapid exploration of pipelined embedded processors. processors • Same concept but EXPRESSION was preferred over LISA because: over LISA because: – In LISA, the designer has to manually implement the datapath components with a major problem being the design verification (since the operations have to be design verification (since the operations have to be described and maintained twice, the LISA model and the HDL model of handwritten datapath). 28 RTL (+)) for EXPRESSION: Functional Abstraction RTL ( for EXPRESSION: Functional Abstraction Enhanced support for functional Abstraction Fetch Unit Design Functionality Always: OPERAND read! Parameters/Architecture #operation read / cycle FetchUnit ( # of read/cycle, res FetchUnit ( # of read/cycle res‐station station size, ... ) size ) { addres = ReadPC(); instructions = ReadInstMemory (address, n); WriteToReservationStation (instructions n); WriteToReservationStation (instructions, n); outInst = ReadFromReservationStation (m); WriteLatch (decode_latch, outInst); Reservation station size pred = QueryPredictor (address); pred QueryPredictor (address); if pred { nextPC = QueryBTB (address); setPC (nextPC); } else } else IncrementPC(x); Branch prediction scheme # read & write ports Using ADLs, we are forced to make a new specification Î No Abstraction } Figure 12. A Fetch Unit utilizing sub‐ g g functions [Aru04] 29 RTL (+) for EXPRESSION: The Framework RTL (+) for EXPRESSION: The Framework Figure 13. Modified Architecture Exploration Framework using EXPRESSION [Aru04] 30 RTL (+) for EXPRESSION: Results RTL (+) for EXPRESSION: Results • The The generated HDL code consists of three generated HDL code consists of three major parts: instruction decoder, datapath, and control logic and control logic. • Results: synthesizable HDL description and rapid exploration of the DLX architecture rapid exploration of the DLX architecture Spec (words) HDL Code (words) Area (gates) Speed (MHz) RISC‐DLX 2063 6612 118 K 33.0 PEAS‐DLX 1196 6259 105 K 5.3 6.23x Slower 1.12x Larger Table 2. Synthesis results: RISC‐DLX vs. PEAS‐DLX [Aru04] 31 LISA++: 1 LISA++: 1st Improvement LISA model: Behavior Section If (cc) { R[idx] = value; } Behavioral optimization Behavioral optimization Figure 14. Decision minimization [Oli05] 32 LISA++: 2 LISA++: 2nd Improvement Optimize Structural optimization Figure 15. Sharing multiplexer implementation [Oli05] 33 LISA++: Improvement Results LISA++: Improvement Results • Results: Area and timing delay of three different i l implementations (unoptimized, optimized, and t ti ( ti i d ti i d d original) of the RTL description of ICORE from Infineon and ISS 68HC11 from Motorola Infineon and ISS‐68HC11 from Motorola. Architecture Version Area (KGates) Clock Period (ns) ISS‐68HC11 (unopt.) 24.52 5.82 ISS‐68HC11 (opt.) 19.55 5.57 Original M68HC11 15.00 5.00 ICORE (unopt.) 50.85 6.07 ICORE (opt.) 39.40 6.08 ICORE hand‐written 42.00 8.00 Table 3. Performance Comparison [Oli05] 0.04x Faster 1.25x Smaller ~Same Speed 1.29x Smaller 34 1. Introduction Agenda – Current Market Trends & Adaptation – Design Automation & EDAs – Research Area 2. Full System‐Level Design Flow – Architecture Exploration / Tradeoffs – Architecture Design 3. Architecture Exploration at High Abstraction Levels h l h b l – Architecture Description Language (ADL) – ADL for Design Space Exploration 4 RTL Synthesis for Architecture Exploration 4. RTL S th i f A hit t E l ti – Proposed Approaches/Frameworks – Improvement 5 Asynchronous Design Support 5. Asynchronous Design Support – Asynchronous Logic – Transformation Framework – RTL Synthesis for Architecture Exploration: Glue It RTL Synthesis for Architecture Exploration: Glue It 6. Summary & References 35 Asynchronous Design Support Asynchronous Design Support • Embedded systems design gets more complicated by increased demand to maximize the achievable system y performance in least possible chip area • Process design parameters gets more and more variable. ÎIncreased difficulty to implement single clock scheme that ÎIncreased difficulty to implement single clock scheme that is distributed and synchronized all over the chip • Solution is “Go Asynchronous”: – Implement Implement some (or all) of the design using asynchronous some (or all) of the design sing as nchrono s circuits. – Direct & Automated transformation path from synchronous design to asynchronous design (Next!) design to asynchronous design (Next!) • Problems with the first solution: – Asynchronous circuit design training needed. – Huge lack for design automation tools supporting asynchronous Huge lack for design automation tools supporting asynchronous circuits 36 Transformation Framework: Goal Transformation Framework: Goal • Goal: to start with synchronous design and then transform the complete system into asynchronous transform the complete system into asynchronous design using standard logic synthesis tools and a transformation script. • Advantages: – Complete synchronous to asynchronous IP block transformation – All asynchronous IP blocks are immune to process All h IP bl k i variability problem, low power consumption, & no clock skew! – An extension to the existing design flow paradigms and A t i t th i ti d i fl di d tools – Saving money because no asynchronous circuit design training is needed training is needed 37 Transformation Framework: Steps Transformation Framework: Steps Asynchronous logic definition type astd_logic is record one: std_logic; zero: std_logic; std logic; end record; LIBRARYastd_logic_1164 Asynchronous logic definition t peastd type astd_logic logicisrecord is record one: std_logic; zero: std_logic; type astd_logic_vector is array (integerend range <>) record; of astd_logic; type astd_logic_vector is array (integer range <>) of astd_logic; Asynchronous logic definition Operator overloading function “or“ (a, b: astd_logic) return astd_logic is variable ariabletm tmp p: astd_logic; astd logic begin tmp.one := (a.zero ANDb.one) OR (a.one ANDb.one) OR (a.one ANDb.zero); tmp.zero := a.zero ANDb.zero; return tmp; end “or“; Operator overloading function “or“ (a, b: astd_logic) return astd_logic is variable tmp: astd_logic; begin tmp.one := (a.zero AND b.one) OR (a one AND b.one) (a.one b one) OR (a.one AND b.zero); tmp.zero := a.zero AND b.zero; return tmp; end “or“; • • • Std_logic should be replaced by Std logic should be replaced by astd_logic (vectors too). Remove and replace clock wires from ports with the corresponding asynchronous handshaking interface (Flip‐flops too). Insert Müller C‐Element for synchronization if the design was clocked. Synchronous Design RTL Synthesis Transformation Script Asynchronous OR Gate [Juh05] 38 Transformation Framework: Results Transformation Framework: Results • Results: Results: for asynchronous 2 for asynchronous 2‐stage stage pipelined pipelined multiplier. Design Area (Gates) Trise (ns) Tfall (ns) Synchronous multiplier (low effort) 723 16.88 17.07 Multiplier (low effort) Multiplier (low effort) 2997 28 90 28.90 29 01 29.01 Multiplier (high effort) 3632 21.13 21.10 16‐input C‐element (high effort) 50 3.40 3.76 1.25x Slower! 5.02x Bigger! Table 5. Synthesis Results for 2‐Stage Pipelined Multiplier [Juh05] 39 RTL Synthesis for Architecture Exploration: Glue It l l Evaluation n Statistics (Area a, Power, Clock Frequency) Evaluation Statistics (Perform mance) Not researched /proposed yet (until now!) Figure 16. Proposed Architecture Exploration with AsynLogic Glue for [Aru04] Framework 40 1. Introduction Agenda – Current Market Trends & Adaptation – Design Automation & EDAs – Research Area 2. Full System‐Level Design Flow – Architecture Exploration / Tradeoffs – Architecture Design 3. Architecture Exploration at High Abstraction Levels h l h b l – Architecture Description Language (ADL) – ADL for Design Space Exploration 4 RTL Synthesis for Architecture Exploration 4. RTL S th i f A hit t E l ti – Proposed Approaches/Frameworks – Improvement 5 Asynchronous Design Support 5. Asynchronous Design Support – Asynchronous Logic – Transformation Framework – RTL Synthesis for Architecture Exploration: Glue It RTL Synthesis for Architecture Exploration: Glue It 6. Summary & References 41 Summary • Design automation tools currently used to g y achieve automation between architecture specification and RT‐Level are either inflexible or don’tt support full system architecture generation. don support full system architecture generation • The need for synthesis driven architecture exploration is obvious. • Asynchronous circuit design provides an alternative or extension for today’s complex synchronous design schemes synchronous design schemes. • Design automation is and will be a major part of g / p p the design/development process. 42 References • • • • • • • • • • • [Alb03] Alberto Sangiovanni‐Vincentelli: “The Tides of EDA”, in 40th Design Automation Conference, 2003. [Fir03] Firaz Samet, M. Anouar Dziri, Flavio Rech Wagner, Wander O. Cesário, Ahmed A. Jerraya: “Combining p p p p g y p architecture exploration and a path to implementation to build a complete SoC design flow from system specification to RTL”, in proceedings of the ASP‐DAC, 2003, pp. 219 – 224. [Pra98] Pradip Bose, Thomas M. Conte: “Performance Analysis and Its Impact on Design”, Computer, vol. 31, pp. 41‐49, May. 1998. [Pra06] Prabhat Mishra, Nikil Dutt: “Architecture Description Languages”, To appear in Customizable and Configurable Embedded Processors, Paolo Ienne and Rainer Leupers, Editors, Morgan Kaufmann Publishers, 2006. [Aru03] Arun Kejariwal, Prabhat Mishra, Nikil Dutt: “Rapid Exploration of Pipelined Processors through Automatic Generation of Synthesizable RTL Models”, in proceedings of Rapid Systems Prototyping Workshop, 2003, pp. 226 – 232. [Oli02] Oliver Schliebusch, Andreas Hoffmann, Achim Nohl, Gunnar Braun, Heinrich Meyr: “Architecture Implementation Using the Machine Description Language LISA”, in proceedings of conference on Asia South Pacific design automation/VLSI Design, p 239, 2002. [Oli04] Oliver Schliebusch, A. Chattopadhyay, R. Leupers, G. Ascheid, H. Meyr, et al.: “RTL Processor Synthesis for Oli S hli b h A Ch tt dh R L G A h id H M t l “RTL P S th i f Architecture Exploration and Implementation”, in proceedings of Design Automation and Test in Europe Conference and Exhibition, vol. 3, pp. 156‐ 160, Feb. 2004. [Aru04] Arun Kejariwal, Prabhat Mishra, Nikil Dutt: “Synthesis‐driven Exploration of Pipelined Embedded Processors”, in proceedings of VLSI Design 17th International Conference, pp. 921 – 926, 2004. [Aru03] Arun Kejariwal, Prabhat Mishra, Nikil Dutt: Arun Kejariwal Prabhat Mishra Nikil Dutt: “Rapid Rapid Exploration of Pipelined Processors through Automatic Exploration of Pipelined Processors through Automatic Generation of Synthesizable RTL Models”, in proceedings of Rapid Systems Prototyping Workshop, 2003, pp. 226 – 232. [Oli05] Oliver Schliebusch, Anupam Chattopadhyay, Ernst Martin Witte, David Kammler, Gerd Ascheid, Rainer Leupers, Heinrich Meyr: “Optimization Techniques for ADL‐driven RTL Processor Synthesis”, in proceedings of the 16th IEEE International Workshop on Rapid System Prototyping, pp. 165 – 171, 2005. [Juh05] Juha Plosila, Johnny Oberg, Peeter Ellervee: Juha Plosila Johnny Oberg Peeter Ellervee: “Automatic Automatic synthesis of asynchronous circuits from synchronous synthesis of asynchronous circuits from synchronous RTL descriptions”, in NORCHIP Conference, pp. 200‐ 205, 2005. 43 Thank you! Any Questions? y yQ 44