Reconfigurable HPC May 14, 2004 , TU Tallinn, Estonia Reiner Hartenstein TU Kaiserslautern Reconfigurable HPC part 4 miscellaneous Time to Market TU Kaiserslautern • A Fundamental Paradigm Shift in Silicon Application Revenue / month [Tom Kean] Update 2 Update 1 reconfigurable Product Product with download ASIC Product Time / months 1 © 2004, reiner@hartenstein.de 10 20 2 30 http://hartenstein.de [Keutzer / Newton] TU Kaiserslautern The next Revolution: [Keutzer / Newton] Makimoto’s 3rd wave EDA industry paradigm switching every 7 years 82% of designers hate their tools Tornado 2006 Mainstream Paradigm Shift 1999 1992 Synthesis: Cadence, Synopsys ... 1985 1978 Reconfigurability (Co-) Compilation & [Hartenstein] Data-stream-based (r)DPAs Schematics entry: Daisy, Mentor, Valid ... Transistor entry: Applicon, Calma, CV ... © 2004, reiner@hartenstein.de 3 [Richard Newton] http://hartenstein.de TU Kaiserslautern Software to Configware Migration Software to Configware Migration is the most important source of speed-up Hardware is just frozen Configware this talk will illustrate the performance benfit which may be obtained from Reconfigurable Computing stressing coarse grain Reconfigurable Computing (RC), point of view, this talk hardly mentions FPGAs (But coarse grain may be always mapped onto FPGAs) © 2004, reiner@hartenstein.de 4 http://hartenstein.de avoiding specific silicon …. TU Kaiserslautern number of design starts [N. Tredennick, Gilder Technology Report, 2003] 50,000 40,000 h p r mo re a w rGA-based 30,000 20,000 10,000 0.13 ASIC year c) © 2004, reiner@hartenstein.de 0 2001 5 2002 2003 2004 http://hartenstein.de TU Kaiserslautern System gates 10 000 000 Mega-rGAs per rGA chip [Xilinx Data] planned Virtex II 1 000 000 Virtex XC 40250XV XC 4085XL 100 000 10 000 1 000 500 200 100 1984 1986 1988 © 2004, reiner@hartenstein.de 1990 1992 1994 6 1996 1998 2000 Jahr 2002 2004 http://hartenstein.de Xputer Lab TU Kaiserslautern University of Kaiserslautern HLL Embedded hardw. CPU & memory cores on chip. Compiler FPGA core HLL Compiler CPU Memory core core [à la S. Guccione] © 2002, reiner@hartenstein.de 7 http://kressarray.de TU Kaiserslautern entire system on a single chip • Xilinx Virtex-II Pro FPGA Architecture • PowerPC 405 RISC CPU (PPC405) cores • FPGA Fabric-based on Virtex-II Architecture all you need on board Rocket IO Power PC Core On Chip Memory Controller Embeded RAM Source: Ivo Bolsens, Xilinx © 2004, reiner@hartenstein.de 8 http://hartenstein.de What’s Wrong with This Picture? TU Kaiserslautern What About PLD Cores on ASICs ? Embedded FPGA Fabric [Jonathan Rose] 1. Still Have to Make the Chip 2. Need Two Sets of Software to Build It – The ASIC Flow – The PLD Flow 3. Have No Idea What to Connect the PLD Pins to – Chances Are, You Are Going to Get It Wrong! http://hartenstein.de © 2004, reiner@hartenstein.de 9 What’s Right with This Picture! TU Kaiserslautern Embedded CPU Serial Link, Analog, “etc.” [Jonathan Rose] 1. Pre-Fabricated 2. One CAD Tool Flow! 3. Can Connect Anything to Anything PLDs are built for general connectivity © 2004, reiner@hartenstein.de 10 http://hartenstein.de >> rGAs << TU Kaiserslautern • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA http://www.uni-kl.de © 2004, reiner@hartenstein.de • Future directions • conclusions 11 http://hartenstein.de Different Morphware-Platforms: TU Kaiserslautern Reconfigurable Logic Blocks fine grain reconfigurable Reconfigurable Interconnect Blocks Reconfigurable interconnect fabrics Reconfigurable Datapath Arrays coarse grain reconfigurable © 2004, reiner@hartenstein.de 12 http://hartenstein.de rGA w. island architecture (Ausschnitt) TU Kaiserslautern connect box Interkonnectswitch switch box Fabrics reconfigurable logic block © © 2004, 2003,reiner@hartenstein.de reiner@hartenstein.de 13 13 http://hartenstein.de http://hartenstein.de Switch box Xputer Lab TU Kaiserslautern TU Kaiserslautern switch point switch box © © 2004, 2003,reiner@hartenstein.de reiner@hartenstein.de 14 14 http://hartenstein.de http://hartenstein.de Xputer Lab TU Kaiserslautern TU Kaiserslautern connect box point © © 2004, 2003,reiner@hartenstein.de reiner@hartenstein.de 15 15 http://hartenstein.de http://hartenstein.de Xputer Lab TU Kaiserslautern TU Kaiserslautern conncect point activated Verbindu ngspunkt (vergröße rt) © © 2004, 2003,reiner@hartenstein.de reiner@hartenstein.de 16 16 http://hartenstein.de http://hartenstein.de Xputer Lab TU Kaiserslautern TU Kaiserslautern switch boxes activated 3 Schaltpunkte switch point der 4. Schaltpunkt der 5. Schaltpunkt © 2004, reiner@hartenstein.de © 2003, reiner@hartenstein.de 17 17 switch box http://hartenstein.de http://hartenstein.de Result Xputer Lab TU Kaiserslautern TU Kaiserslautern 18 © 2004, reiner@hartenstein.de 18 http://hartenstein.de http://hartenstein.de © 2003, reiner@hartenstein.de Xputer Lab TU Kaiserslautern TU Kaiserslautern A Routing completed for 1 net 20 Transistors + 20 Flipflops 1979 Silva Lisco (Silicon Valley Research Corp.) offers CALM-P B © 2004, reiner@hartenstein.de 19 19 http://hartenstein.de http://hartenstein.de © 2003, reiner@hartenstein.de >> Placement & Routing << TU Kaiserslautern • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA http://www.uni-kl.de © 2004, reiner@hartenstein.de • Future directions • conclusions 20 http://hartenstein.de A TU Kaiserslautern Routing: long distance net passing through At a time a path may be used only for one signal... ... Bridges of Königsberg B © 2004, reiner@hartenstein.de 21 http://hartenstein.de A TU Kaiserslautern Routing congestion C and D are not reachable C D C cannot beconnected with D. C and D need another placement B © 2004, reiner@hartenstein.de rLBs are not 100% usable 22 http://hartenstein.de TU Kaiserslautern Leonhard Euler Euler‘s Problem of the bridges Königsberg is such a network (1736): Find a way, which crosses each bridge exactly once ..... © 2004, reiner@hartenstein.de 1736 ... Also an optimization: none of the bridges is unused. 23 http://hartenstein.de TU Kaiserslautern L. Euler: Solutio Problematis Ad geometriam Situs Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128-140 Graph node Right Bank Kneiphof Island Other Island Left Bank © 2004, reiner@hartenstein.de 24 http://hartenstein.de Crossbar TU Kaiserslautern Crossbr switch 1913 J. N. Reynold‘s crossbar switch 1915 patent granted 1926 first public telefon switching application in Shweden Betulander‘s crossbar switch 1919 NASA telemetrics crossbar array 1964 © 2004, reiner@hartenstein.de 25 http://hartenstein.de TU Kaiserslautern Crossbar complete? Crossbar Chips available from Aptix, Texas Instruments and others One bar connects 2 pins cossbar chips in a row n no of crossbar chips needed partial full n x n/2 n 4 8 4 100 5000 100 Size of full complete switchs: n x n / 2 © 2004, reiner@hartenstein.de 26 http://hartenstein.de Detour connection Routing congestion example with detour TU Kaiserslautern rGA rGA rGA rGA Routing through Direct connection impossible rLB Routing-Resources: Logic gates and/or pass transistors © © 2004, 2003,reiner@hartenstein.de reiner@hartenstein.de 27 27 Identity function configured http://hartenstein.de TU Kaiserslautern Crossbar-based Architectures 16 bit C T L EXU 1990: UC Berkeley (Jan Rabaey) 1993: PADY-II (Jan Rabaey) 1997: Pleiades (mesh & crossbar) C T L EXU C T L EXU C T L EXU crossbar switch I/O I/O C T L EXU C T L EXU C T L EXU C T L EXU 32 bit © 2004, reiner@hartenstein.de 28 http://hartenstein.de PADDI-II Architecture TU Kaiserslautern P1 P2 P3 P4 P5 P6 P7 P8 Level-2 Network 16 x 16b © 2004, reiner@hartenstein.de P9 P10 P11 P12 P13 P14 P15 P16 P25 P26 P27 P28 P29 P30 P31 P32 I/O I/O I/O I/O P17 P18 P19 P20 P21 P22 P23 P24 break-switch I/O break-switch 6 x 16b I/O P33 P34 P35 P36 P37 P38 P39 P40 29 16 x 6 switch matrix 4-PE Cluster P45 P46 P47 P41 P42 P43 P44 P45 P46 P47 P48 P48 I/O I/O Level-1 Network http://hartenstein.de >> Soft Processors << TU Kaiserslautern http://www.uni-kl.de © 2004, reiner@hartenstein.de • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA • Future directions • conclusions 30 http://hartenstein.de TU Kaiserslautern FPGA CPUs in teaching and academic research • Michigan State • Universidad de Valladolid, Spain • Virginia Tech • Washington University, St. Louis • New Mexico Tech • UC Riverside • Tokai University, Japan • UCSC: 1990! • Märaldalen University, Eskilstuna, Sweden • Chalmers University, Göteborg, Sweden • Cornell University • Gray Research • Georgia Tech • Hiroshima City University, Japan © 2004, reiner@hartenstein.de 31 http://hartenstein.de Some soft CPU core examples TU Kaiserslautern core architecture platform MicroBlaze 125 MHz 70 D-MIPS 32 bit standard RISC 32 reg. by 32 LUT RAMbased reg. Xilinx up to 100 on one FPGA Nios 16-bit instr. set Nios 50 MHz Nios core architecture platform Leon 25 Mhz SPARC ARM7 clone ARM uP1232 8-bit CISC, 32 reg. Altera Mercury 200 XC4000E CLBs REGIS 32-bit instr. set Altera 22 D-MIPS 8 bits Instr. + ext. ROM 2 XILINX 3020 LCA Reliance-1 12 bit DSP 8 bit Altera – Mercury Lattice 4 isp30256, 4 isp1016 1Popcorn-1 8 bit CISC Altera, Lattice, Xilinx gr1040 16-bit gr1050 32-bit My80 i8080A FLEX10K30 or EPF6016 YARD-1A 16-bit RISC, 2 opd. Instr. old Xilinx FPGA Board DSPuva16 16 bit DSP Spartan-II xr16 RISC integer C SpartanXL © 2004, reiner@hartenstein.de Acorn-1 32 1 Flex 10K20 http://hartenstein.de einige „soft CPU core“ Beispiele TU Kaiserslautern core architecture platform MicroBlaze 125 MHz 70 D-MIPS 32 bit standard RISC 32 reg. by 32 LUT RAMbased reg. Xilinx up to 100 on one FPGA Nios 16-bit instr. set Nios 50 MHz Nios core architecture platform Leon 25 Mhz SPARC ARM7 clone ARM uP1232 8-bit CISC, 32 reg. Altera Mercury 200 XC4000E CLBs REGIS 32-bit instr. set Altera 22 D-MIPS 8 bits Instr. + ext. ROM 2 XILINX 3020 LCA Reliance-1 12 bit DSP 8 bit Altera – Mercury Lattice 4 isp30256, 4 isp1016 1Popcorn-1 8 bit CISC Altera, Lattice, Xilinx gr1040 16-bit gr1050 32-bit My80 i8080A FLEX10K30 or EPF6016 YARD-1A 16-bit RISC, 2 opd. Instr. old Xilinx FPGA Board DSPuva16 16 bit DSP Spartan-II xr16 RISC integer C SpartanXL © 2004, reiner@hartenstein.de Acorn-1 33 1 Flex 10K20 http://hartenstein.de It’s a Paradigm Shift ! TU Kaiserslautern • Using FPGAs (fine grain reconfigurable) just mainly has been classical Logic Synthesis on a “strange hardware” platform • Coarse Grain Reconfigurable Arrays (rDPAs) (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift • This is still ignored by CS and EE Curricula and almost all R&D scenes © 2004, reiner@hartenstein.de 34 http://hartenstein.de Why the speed-up ... TU Kaiserslautern ... although FPGA is clock slower by x 3 or even more (most know-how from „high level synthesis“ discipline) support operations: no clock nor memory cycle decisions without memory cycles nor clock cycles moving operator to the data stream (before run time) most „data fetch“ without memory cycle © 2004, reiner@hartenstein.de 35 http://hartenstein.de TU Kaiserslautern http://www.uni-kl.de © 2004, reiner@hartenstein.de >> History of Frameworks << • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA • Future directions • conclusions 36 http://hartenstein.de Goal: away from complex design flow Xputer Lab TU Kaiserslautern University of Kaiserslautern [à la S. Guccione] Schematics/ HDL Netlister Netlist Place and Route Bitstream HLL Compiler © 2002, reiner@hartenstein.de 37 http://kressarray.de Xputer Lab TU Kaiserslautern University of Kaiserslautern Overcome traditional separate design flow [à la S. Guccione] HLL Schematics/ HDL Netlister Netlist Compiler Place and Route . . Bitstream User Code Compiler Executable © 2002, reiner@hartenstein.de 38 http://kressarray.de Overcome traditional co-processing design Xputer Lab TU Kaiserslautern separate flow -> JBits Design Flow University of Kaiserslautern [à la S. Guccione] Schematics/ HDL JBits API Netlister Netlist Place and Route User Java Code Java Compiler Executable . . Bitstream User Code Compiler Executable © 2002, reiner@hartenstein.de 39 http://kressarray.de new directions in application development TU Kaiserslautern • new directions in application development. • aut. partitioning compilers: designer productivity • like CoDe-X (Jürgen Becker, Univ. of Karlsruhe), • supports Run-Time Reconfiguration (RTR), a key enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet. © 2004, reiner@hartenstein.de 40 http://hartenstein.de TU Kaiserslautern http://www.uni-kl.de © 2004, reiner@hartenstein.de >> RTR << • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA • Future directions • conclusions 41 http://hartenstein.de Xputer Lab TU Kaiserslautern University of Kaiserslautern CPU use for configuration management • on-board microprocessor CPU is available anyhow - even along with a little RTOS • use this CPU for configuration management RTR System Design HLL © 2002, reiner@hartenstein.de 42 Compiler http://kressarray.de hard CPU & memory core on same chip Xputer Lab TU Kaiserslautern University of Kaiserslautern HLL Compiler FPGA core RTR System Design HLL © 2002, reiner@hartenstein.de Compiler 43 CPU Memory core core http://kressarray.de Xputer Lab TU Kaiserslautern University of Kaiserslautern Converging factors for RTR • Converging factors make RTR based system design viable • 1) million gate FPGA devices and co-processing with standard microprocessors are commonplace • direct implementation of complex algorithms in FPGAs. • This alone has already revolutionized FPGA design. JBits • 2) new tools like Xilinx Jbits API software tool suite directly support coprocessing and RTR. User Java Code © 2002, reiner@hartenstein.de 44 Java Compiler Executable http://kressarray.de RTR TU Kaiserslautern • divides application into a series of sequentially executed stages, each mapped as a separate execution module. • Excellent example :Xtrem platform by PACT AG, Munich • Without RTR, all configurable platforms just ASIC emulators. • directly support development and debugging of RTR applications • will also heavily influence the future system organization © 2004, reiner@hartenstein.de 45 http://hartenstein.de TU Kaiserslautern http://www.uni-kl.de © 2004, reiner@hartenstein.de >> Support by rGA vendors << • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA • Future directions • conclusions 46 http://hartenstein.de >> Support … TU Kaiserslautern • Support by FPGA Vendors – Xilinx • Software by Xilinx • Configware (soft IP Cores) • Hardware – Altera • Software • Configware • Hardware © 2004, reiner@hartenstein.de 47 http://hartenstein.de Xilinx TU Kaiserslautern • fabless FPGA semi vendor, San Jose, Ca, founded 1984 • key patents on FPGAs (expiring in a few years) • Fortune 2001: No. 14 Best Company to work for in (intel: no. 42, hp no. 64, TI no. 65). • DARPA grant (Nov‘99) to develop Jbits API tools for internet reconfigurable / upgradable logic (w. VT) • Less brilliant early/mid 90ies (president Curt Wozniak): 1995 market share from 84% down to 62% [Dataquest] • As designs get larger, Xilinx losed its advantage (bugfixes did not require to burn new chips) • meanwhile, weeks of expensive debug time needed © 2004, reiner@hartenstein.de 48 http://hartenstein.de Software by Xilinx TU Kaiserslautern • Full design flow from Cadence, Mentor, and Synopsys • Xilinx Software AllianceEDA Program: – – – – – – – Alliance Series Development System. Foundation Series Development Systems. Xilinx Foundation Series ISE (Integrated Synthesis Environment) free WebPOWERED SW w. WebFitter & WebPACK-ISE StateCAD XE and HDL Bencher Foundation Base Express Foundation ISE Base Express ----- More: • ModelSim Xilinx Edition (ModelSim XE) | Forge Compiler | Modular Design | Chipscope ILA | The Xilinx System Generator| XPower| JBits SDK | The Xilinx XtremeDSP Initiative| MathWorks / Xilinx Alliance| System Generator| The Wind River / Xilinx alliance| © 2004, reiner@hartenstein.de 49 http://hartenstein.de Configware (soft IP Products) TU Kaiserslautern • For libraries, creation and reuse of configware • To search for IPs see: List of all available IP • The AllianceCORE program is a cooperation between Xilinx and third-party core developers • The Xilinx Reference Design Alliance Program • The Xilinx University Program • LogiCORE soft IP with LogiCORE PCI Interface. • Consultants © 2004, reiner@hartenstein.de 50 http://hartenstein.de Xilinx hardware TU Kaiserslautern • Virtex, Virtex-II, first w. 1 mio system gates. – Virtex-E series > 3 mio system gates. • Virtex-EM on a copper process & addit. on chip memory f. network switch appl. • The Virtex XCV3200E > 3 million gates, 0.15-micron technology, • Spartan, Spartan-XL, Spartan-II – for low-cost, high volume applications as ASIC replacements – Multiple I/O standards, on-chip block RAM, digital delay lock loops – eliminate phase lock loops, FIFOs, I/O xlators , system bus drivers • XC4000XV, XC4000XL/XLA, CPLD: low-cost families – rapid development, longer system life, robust field upgradability – support In-System Programming (ISP), in-board debugging, – test during manufacturing, field upgrades, full JTAG compliant interface • CoolRunner: low power, high speed/density, standby mode. • Military & Aerospace: QPRO high-reliability QML certified • Configuration Storage Devices © 2004, reiner@hartenstein.de 51 http://hartenstein.de Altera TU Kaiserslautern • Altera was founded in June 1983 • EDA: synthesis, place & route, and, verification • Quartus II: APEX, Excalibur, Mercury, FLEX 6000 families • MAX+PLUS II: FLEX, ACEX & MAX families • Flow with Quartus II: Mentor Graphics, Synopsys, Synplicity deliver a design design software to support Altera SOPC solutions. • Mentor: only EDA vendor w. complete design environment f. APEX II incl. IP, design capture, simulation, synthesis, and h/s coverification • Configware: Altera offers over a hundred IP cores • Third party IP core design services and consultants © 2004, reiner@hartenstein.de 52 http://hartenstein.de Altera hardware TU Kaiserslautern • Newer families: APEX 20KE, APEX 20KC, APEX II, MAX 7000B, ACEX 1K, Excalibur, Mercury families. – Apex EP20K1500E (0.18-µ), up to 2.4 mio system gates, – APEX II (all-copper 0.13-µ) f. data path applications, supports many I/O standards. 1-Gbps True-LVDS performance – wQ2001, an ARM-based Excalibur device • Altera mainstream: MAX 7000A, 3000A; FLEX 6000, 10KA, 10KE; APEX 20K families. • Mature and other : Classic, MAX 7000, 7000S, 9000; FLEX 8000, 10K families. © 2004, reiner@hartenstein.de 53 http://hartenstein.de TU Kaiserslautern http://www.uni-kl.de © 2004, reiner@hartenstein.de >> EDA << • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA • Future directions • conclusions 54 http://hartenstein.de >> EDA << TU Kaiserslautern • EDA as the Key Enabler (major EDA vendors) • Altera • Cadence • Mentor Graphics • Synopsys • Xilinx • Changing EDA Tools Market © 2004, reiner@hartenstein.de 55 http://hartenstein.de EDA as the Key Enabler (major EDA vendors) TU Kaiserslautern • Select EDA quality / productivity, not FPGA architectures • EDA often has massive software quality problems • Customer: highest priority EDA center of excellence – – – – – collecting EDA expertise and EDA user experience to assemble best possible tool environments for optimum support design teams to cope with interoperability problems to keep track with the EDA scene as a rapidly moving target • being fabless, FPGA vendors spend most qualified manpower in development of EDA, IP cores, applications , support • Xilinx and Altera are morphing into EDA companies. © 2004, reiner@hartenstein.de 56 http://hartenstein.de Cadence TU Kaiserslautern • FPGA Designer: top-down FPGA design system, • high-level mapping, architecture-specific optimization, • Verilog,VHDL, schematic-level design entry. • Verilog, VHDL to Synergy (logic synthesis) and FPGA Designer • FPGAs simulated by themselves using Cadence's VerilogXL or Leapfrog VHDL simulators and • simulated w. rest of the system design w. Logic Workbench board/system verification env‘ment. • Libraries for the leading FPGA manufacturers. © 2004, reiner@hartenstein.de 57 http://hartenstein.de Mentor Graphics TU Kaiserslautern • • • • System Design and Verification. PCB design and analysis: IC Design and Verification shifts ASIC design flow to FPGAs (Altera, Xilinx) – – – – by FPGA Advantage with IP support by ModuleWare, Xilinx CORE Generator Altera MegaWizard integration, © 2004, reiner@hartenstein.de 58 http://hartenstein.de Synopsys TU Kaiserslautern • • • • • FPGA Compiler II Version of ASIC Design Compiler Ultra Block Level Incremental Synthesis (BLIS) ASIC <-> FPGA migration Actel, Altera, Atmel, Cypress, Lattice, Lucent, Quicklogic, Triscend, Xilinx © 2004, reiner@hartenstein.de 59 http://hartenstein.de new directions in application development TU Kaiserslautern • • • • new directions in application development. aut. partitioning compilers: designer productivity like CoDe-X (Jürgen Becker, Univ. of Karlsruhe), supports Run-Time Reconfiguration (RTR), a key enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet. © 2004, reiner@hartenstein.de 60 http://hartenstein.de Xputer Lab TU Kaiserslautern University of Kaiserslautern Converging factors for RTR • Converging factors make RTR based system design viable • 1) million gate FPGA devices and co-processing with standard microprocessors are commonplace • direct implementation of complex algorithms in FPGAs. • This alone has already revolutionized FPGA design. JBits • 2) new tools like Xilinx Jbits API software tool suite directly support coprocessing and RTR. User Java Code © 2002, reiner@hartenstein.de 61 Java Compiler Executable http://kressarray.de RTR TU Kaiserslautern • divides application into a series of sequentially executed stages, each implemented as a separate execution module. • Partial RTR partitions these stages into finer-grain sub-modules to be swapped in as needed. • Without RTR, all conf. platforms just ASIC emulators. • needs a new kind of application development environments. • directly support development and debugging of RTR appl. • essential for the advancement of configurable computing • will also heavily influence the future system organization • Xilinx, VT, BYU work on run-time kernels, run-time support, RTR debugging tools and other associated tools. • smaller, faster circuits, simplified hardware interfacing, fewer IOBs; smaller, cheaper packages, simplified software interfaces. © 2004, reiner@hartenstein.de 62 http://hartenstein.de Run-time Mapping TU Kaiserslautern • run-time reconfigurable are: Xilinx VIRTEX FPGA family • RAs being part of Chameleon CS2000 series systems • Using such devices changes many of the basic assumptions in the HW/SW co-design process: • host/RL interaction is dynamic, needs a tiny OS like eBIOS, also to organize RL reconfiguration under host control • typical goal is minimization of reconfiguration latency (especially important in communication processors), to hide configuration loading latency, and, • Scheduling to find ’best’ schedule for eBIOS calls (C~side). © 2004, reiner@hartenstein.de 63 http://hartenstein.de >> future directions << TU Kaiserslautern • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA http://www.uni-kl.de © 2004, reiner@hartenstein.de • Future directions • conclusions 64 http://hartenstein.de Soft CPU: new job for compilers Xputer Lab TU Kaiserslautern University of Kaiserslautern Memory FPGA core HLL © 2002, reiner@hartenstein.de soft CPU Compiler 65 FPGA http://kressarray.de Soft rDPA feasible ? Xputer Lab TU Kaiserslautern University of Kaiserslautern [à la S. Guccione] © 2002, reiner@hartenstein.de 66 http://kressarray.de Array I/O examples Xputer Lab TU Kaiserslautern University of Kaiserslautern data streams, or, from / to embedded memory banks Performance 1000 100 µProc 60%/yr.. 10 1 1980 Processor-Memory Performance Gap: (grows 50% / year) CPU DRAM 1990 [à la S. Guccione] 2000 DRAM 7%/yr.. data streams, or, from / to embedded memory banks © 2002, reiner@hartenstein.de 67 http://kressarray.de HLL 2 Soft Array Xputer Lab TU Kaiserslautern University of Kaiserslautern miscellanous HLL Compiler soft CPU Memory [à la S. Guccione] © 2002, reiner@hartenstein.de 68 http://kressarray.de HLL 2 „flex“ rDPA Xputer Lab TU Kaiserslautern University of Kaiserslautern miscellanous HLL Compiler CPU Memory [à la S. Guccione] © 2002, reiner@hartenstein.de 69 http://kressarray.de Xputer Lab TU Kaiserslautern University of Kaiserslautern >> HLLs << © 2002, reiner@hartenstein.de 70 http://kressarray.de HLLs for Hardware Design vs. System Design vs. RTR System Design Xputer Lab TU Kaiserslautern University of Kaiserslautern HLL Compiler System Design HLL [à la S. Guccione] © 2002, reiner@hartenstein.de Compiler RTR System Design 71 http://kressarray.de HLLs for Hardware Design vs. System Design vs. RTR System Design Xputer Lab TU Kaiserslautern University of Kaiserslautern HLL Compiler HLL Compiler System Design HLL [à la S. Guccione] © 2002, reiner@hartenstein.de Compiler RTR System Design 72 http://kressarray.de CPU and memory on Chip Xputer Lab TU Kaiserslautern University of Kaiserslautern HLL Compiler FPGA core RTR System Design HLL Compiler CPU Memory core core [à la S. Guccione] © 2002, reiner@hartenstein.de 73 http://kressarray.de Jbit Environment Xputer Lab TU Kaiserslautern University of Kaiserslautern RTP Core Library [à la S. Guccione] JRoute API JBits API User Code BoardScope Debugger XHWIF TCP/IP Device Simulator © 2002, reiner@hartenstein.de 74 http://kressarray.de HLLs for Hardware Design vs. System Design vs. RTR System Design Xputer Lab TU Kaiserslautern University of Kaiserslautern HLL Compiler HLL Compiler System Design [à la S. Guccione] © 2002, reiner@hartenstein.de 75 http://kressarray.de Embedded System Design Xputer Lab TU Kaiserslautern University of Kaiserslautern FPGA core HLL Compiler CPU Memory core core HLL Memory core soft CPU FPGA Compiler [à la S. Guccione] © 2002, reiner@hartenstein.de FPGA 76 http://kressarray.de Xputer Lab TU Kaiserslautern University of Kaiserslautern >> conclusions << • rGAs • Placement & Routing • Soft Processors • History of Frameworks • RTR • Support by rGA vendors • EDA http://www.uni-kl.de © 2002, reiner@hartenstein.de • Future directions • conclusions 77 http://kressarray.de Xputer Lab TU Kaiserslautern University of Kaiserslautern missing the next revolution Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula is one of the biggest mistakes in the history of information technology application causing the waste billions of dollars. © © 2004, 2001, reiner@hartenstein.de reiner@hartenstein.de 78 http://hartenstein.de http://KressArray.de TU Kaiserslautern „EDA industry shifts into CS mentality“ [Wojciech Maly] • Microprogramming to replace FSM design • Hardware languages replace EE-type schematics • EDA Software and its interfacing languages • Newer system level languages like systemC etc. • Small and large module re-use • Hierarchical organization of designs, EDA, et al. • ..................... © 2004, reiner@hartenstein.de 79 http://hartenstein.de TU Kaiserslautern „EDA industry shifts into CS mentality“ [Wojciech Maly] • Which language to select ? © 2004, reiner@hartenstein.de 80 http://hartenstein.de roadmap TU Kaiserslautern old CS lab course philosophy: given an application: implement it by a program -/- new CS freshman lab course environment: Given an application: a) implement it by writing a program b) implement it as a morphware prototype c) Partition it into P and Q c.1) implement P by software c.2) implement Q by morphware c.3) implement P / Q communication interface © 2004, reiner@hartenstein.de 81 http://hartenstein.de All enabling technologies are available TU Kaiserslautern • literature from last 30 years • languages & (co-)compilation techniques • anti machine and all its architectural resources • parallel memory IP cores and generators • morphware vendors like PACT .... • anything else needed © 2004, reiner@hartenstein.de 82 http://hartenstein.de TU Kaiserslautern END © 2004, reiner@hartenstein.de 83 http://hartenstein.de TU Kaiserslautern The dichotomy of models • Note for von Neumann: state register is with the CPU • Note for the anti machine: state register is with memory bank / state registers are within memory banks © 2004, reiner@hartenstein.de 84 http://hartenstein.de Machine Paradigms TU Kaiserslautern machine category Computer (the Machine: “v. Neumann”) driven by: Instruction streams data streams (no “dataflow”) engine principles instruction sequencing sequencing data streams state register single program counter (multiple) data counter(s) at run time at load time resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. operation sequential parallel pipe network etc. Communication path set-up . fetch” ) ( “instruction data path *) e g. Bee project Prof. Broderson © 2004, reiner@hartenstein.de The Anti Machine also hardwired implementations* 85 http://hartenstein.de benefit from RAM-based & 2nd paradigm TU Kaiserslautern 1) 2) RAM-based platform needed for: • flexibility, programmability • avoiding the need of specific silicon mask cost: currently 2 mio $ - rapidly growing simple 2nd machine paradigm needed as a common model: • to avoid the need of circuit expertize • needed to to educate zillions of programmers © 2004, reiner@hartenstein.de 86 http://hartenstein.de Design Space Exploration Systems TU Kaiserslautern interactive status evaluation status generation [66] no abstract models rule-based 1992 [67] yes prediction models device generator DIA 1998 [68] yes prediction from library rule-based DSE for RAW 1998 [49] no analytical models analytical ICOS 1998 [76] no fuzzy logic greedy search DSE for Multimedia 1999 [77] no simulation branch and bound yes fuzzy rule-based simulated annealing Explorer System year source DPE 1991 Clio Xplorer 1999 [11] [50] © 2004, reiner@hartenstein.de 87 http://hartenstein.de