New Strategies for System Level Design Daniel Gajski Center for Embedded Computer Systems (CECS) University of California, Irvine gajski@uci.edu Overview VLSIDAT 2006 • Introduction • Issues • Models • Platforms • Tools • Benefits • Conclusion Copyright 2006 Daniel D. Gajski Closing the System Gap Capture & Simulate Describe & Synthesize Specify, Explore & Refine Specs Specs Executable Spec Algorithms Algorithms Algorithms Functionality Algorithms Architecture System Gap SW? SW? Network Design Describe Design SW/HW Logic Simulate Logic Logic Physical Physical Physical Manufacturing Manufacturing Manufacturing 1960's 1980's 2000's Simulate Real gap: behavior and structure (semantics and syntax) VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Connectivity Protocols Performance Timing Simulation Based Methodology Ambiguous semantics of hardware/system level languages 3.415 2.715 -- Finite state machine Controller case X is when X1=> . . . when X2=> -- Look-up table Memory Simuletable but not synthesizable or verifiable VLSIDAT 2006 Copyright 2006 Daniel D. Gajski In Search of a Solution Algebra: < objects, operations> a*(b+c) = a*b + a*c Arithmetic algebra allows creation of expressions and equivalences VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Model Algebra Model algebra: <objects, compositions> B1 B1 = B2 B3 B3 B2 PE1 Model algebra allows creation of models and model equivalences VLSIDAT 2006 Copyright 2006 Daniel D. Gajski PE2 Specify-Explore-Refine Methodology System specification model Design decisions SER Model refinement Intermediate models Replacement or re-composition Cycle accurate implementation model FPGA board VLSIDAT 2006 Copyright 2006 Daniel D. Gajski How many models? Minimal set for any methodology (3 is enough) • System specification model (application designers) • Transaction-level model (system designers) • Pin&Cycle accurate model (implementation designers) VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Three Models (with Respect to OSI) Pin / Cycle Accurate Model Transaction Level Model Specification Model 7. Application 7. Application 6. Presentation 6. Presentation 5. Session 5. Session 4. Transport 4. Transport 3. Network 3. Network Spec 2b. Link + Stream 2a. Media Access Ctrl 2b. Link + Stream TLM 2a. Media Access Ctrl 2a. Protocol 2a. Protocol 1. 1. Physical Physical Address lines Data lines Control lines P/CAM Source: G Schirner VLSIDAT 2006 Copyright 2006 Daniel D. Gajski System Specification CPU Mem Computation • B1 B2 v1 Communication Channels (in C) Variables (in C) Bridge C1 Arbiter • • C2 Behaviors (in C) B3 HW B4 IP System Definition = (Partial) Platform + (Partial) Spec VLSIDAT 2006 Copyright 2006 Daniel D. Gajski CPU Transaction-Level Model (TLM) Mem B1 B2 OS Drivers HAL CPU Bus IP Bus B3 HW VLSIDAT 2006 B4 IP Copyright 2006 Daniel D. Gajski CPU Pin/Cycle Accurate Model (P/CAM) Mem Program EXE RTOS HAL IC Bridge Arbiter HW P/CAM is downloaded automatically for fast prototyping with FPGAs or ASIC design IP Source: D. Gajski et al. VLSIDAT 2006 Copyright 2006 Daniel D. Gajski How many components? Minimal set for any design (4 is enough?) VLSIDAT 2006 • Processing element (PE) • Memory • Transducer / Bridge • Arbiter Copyright 2006 Daniel D. Gajski General System Model Arbiter 2 Arbiter 1 Interrupt2.1 PE 2.1 (Master) Interrupt1.1 PE 1.1 Transducer1-2 Interrupt2.2 PE 2.2 (Slave) Arbiter 3 PE 1.2 PE 3.1 Interrupt3.1 Interrupt3.2 Transducer2-3 Memory 1 Memory 3 Bus1 VLSIDAT 2006 Bus2 Bus3 Copyright 2006 Daniel D. Gajski Transducer Model Addr bus1 Data bus1 PE1 Addr bus2 Data bus2 PE2 Transducer Ready1 Ack_ready1 Interrupt1 Interrupt2 Ready2 Ack_ready2 Processor1 <clk1> FSMD1 <clk1> FSMD2 <clk2> Data1 Memory1 Data2 Memory2 Queue <clk3> Source: H. Cho VLSIDAT 2006 Processor2 <clk2> Copyright 2006 Daniel D. Gajski Processing Element: NISC technology • • • Direct compilation of C to HW (fastest possible execution) Statically and dynamically reconfigurable (anytime, anywhere) Designed for manufacturability (solving timing closure) RF / Scratch pad const CW PC CMem B1 B2 offset status Status AG MUL ALU Memory address B3 Programmable controller VLSIDAT 2006 Datapath Multi-cycle units Pipelined units Controller pipelining Datapath pipelining Copyright 2006 Daniel D. Gajski Data forwarding General System Design Environment Model A GUI Estimation tool Refinement tool Synthesis tool Transforms: t1 t2 . . . tn Component library Verify tool Simulation tool ti Model B VLSIDAT 2006 Copyright 2006 Daniel D. Gajski How many tools? Minimal set for any methodology (2 is enough?) • • VLSIDAT 2006 Front-End (for application developers) – Input: C, C++, Mathlab, UML, … – Output: TLM Back-End (for SW/HW system designers) – Input : TLM – Output: Pin/Cycle accurate Verilog/VHDL Copyright 2006 Daniel D. Gajski ES Environment Validation User Interface (VUI) Decision User Interface (DUI) ESE Front – End Create Compiler Debugger System Capture + Platform Development Select Stimulate Partition Verify Map TIMED Compile CYCLE ACCURATE Replace ESE Back – End SW Development + HW Development Compile Check Simulate Verify Application Tools : Compilers/Debuggers VLSIDAT 2006 Commercial Tools : FPGA, ASIC Copyright 2006 Daniel D. Gajski Benefit: Spec-to-Prototype in 1 Week VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Does it work? • Intuitively it does – Well defined models, rules, transformations, refinements – Worked in the past: layout, logic, RTL? – System level complexity simplified • Proof of concept demonstrated – – – – – Embedded System Environment (ESE) Automatic model generation Model synthesis and verification Universal IP technology (NISC) Productivity gains greater then 1000 • Benefits – – – – Large productivity gains Easy design management Easy derivatives Shorter TTM VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Design flow with NISC technology for(int i=0; i<8; i++) for(int j=0; j<8; j++){ sum=0; for(int k=0; k<8; k++) sum = sum + A[i][k] ×B[k][j]; C[i][j] = sum; Code Refinement } for(int i=0; i<8; i++) for(int j=0; j<8; j++){ i8 = i × 8; sum = *(A + i8) × *(B + j); sum += *(A + i8 + 1) × *(B + 8 + j); ... C[i][j] = sum; } Application NISC Compiler const CW PC CMem RF B1 B2 ALU MUL Mul CW PC CMem OR offset Memory AG status address NISC bL aL const NISC Refinement status status NISC Compiler RF offset AG Application Results B3 NISC ALU Sum AR Mem P Add DR Iterative design & refinement Source: M. Reshadi VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Results DCT with NISC technology Execution Time Power Energy Area 1.4 1.2 1 0.8 0.6 0.4 0.2 0 MIPS NMIPS Performance CDCT1 CDCT2 CDCT3 CDCT4 CDCT5 CDCT6 CDCT7 Manual Power saving Energy saving Area reduction 1.25X NA NA NA CDCT3 vs. NMIPS 5.3X 2.1X 11.6X 2.5X CDCT7 vs. NMIPS 10X 1.3X 12.8X 3X CDCT7 vs. Manual 0.83X 1.3X 0 2.1X NMIPS vs. MIPS Source: B. Gorjiara VLSIDAT 2006 Copyright 2006 Daniel D. Gajski 100 90 80 70 60 50 40 30 20 10 0 0.35 0.3 0.25 0.2 0.15 seconds % chip utilization MP3 on Xillinx with ESE %Slices %BRAMs Exec. time 0.1 0.05 0 SW+0 SW+1 SW+2 SW+4 Design Points • Area • • % of FPGA slices and BRAMS Performance • Time to decode 1 frame of MP3 data Source: S. Abdi VLSIDAT 2006 Copyright 2006 Daniel D. Gajski 100 90 80 70 60 50 40 30 20 10 0 0.35 0.3 0.25 0.2 0.15 seconds % chip utilization MP3 on Xillinx with ESE using NISC %Slices %BRAMs Exec. time 0.1 0.05 0 SW+0 SW+1 SW+2 SW+4 NISC+0 Design Points • Area • • NISC uses fewer FPGA slices and more BRAMs than manual HW Performance • NISC comparable to manual HW and much faster than SW VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Development Manual Development Time withTime ESE 70 person-days 60 50 SW+0 40 SW+1 30 SW+2 ESE 20 SW+4 10 0 Spec. TLM RTL Board models • Model Development time • • Includes time for C, TLM and RTL Verilog coding and debugging ESE drastically cuts RTL and Board development time Source: S. Abdi VLSIDAT 2006 Copyright 2006 Daniel D. Gajski seconds hours Validation Validation TimeTime with ESE X 18.06 hrs 17.71 hrs 17.56 hrs 15.93 hrs 10 10 99 88 77 66 55 44 33 22 11 00 SW+0 SW+0 SW+1 SW+1 SW+2 SW+2 SW+4 SW+4 ESE Spec. Spec. TLM RTL Board Board models • • • Simulation time measured on 3.3 GHz processor Emulation time measured on board with Timer ESE cuts validation time from hours to seconds Source: S. Abdi VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Conclusions • Extreme makeover is necessary for a new paradigm, where – SW = HW = SOC = Embedded Systems – Simulation based flow is not acceptable – Design methodology is based on scientific principles • Model algebra is enabling technology for – System design, modeling and simulation – System synthesis, verification, and test • What is next? – Change of mind – Application oriented EDA – Looking for early adapters VLSIDAT 2006 Copyright 2006 Daniel D. Gajski Thank You Daniel Gajski Center for Embedded Computer Systems (CECS) www.cecs.uci.edu