ECE 448 Lecture 14 ASIC Design ECE 448 – FPGA and ASIC Design with VHDL George Mason University Two competing implementation approaches ASIC Application Specific Integrated Circuit • designed all the way from behavioral description to physical layout • designs must be sent for expensive and time consuming fabrication in semiconductor foundry ECE 448 – FPGA and ASIC Design with VHDL FPGA Field Programmable Gate Array • no physical layout design; design ends with a bitstream used to configure a device • bought off the shelf and reconfigured by designers themselves 2 FPGAs vs. ASICs ASICs FPGAs Off-the-shelf High performance Low development costs Low power Short time to the market Low cost (but only in high volumes) ECE 448 – FPGA and ASIC Design with VHDL Reconfigurability 3 ASIC Design Example – Factoring circuit/GMU Global Memory Local Memory ECE 448 – FPGA and ASIC Design with VHDL 4 ASIC 130 nm vs. Virtex II 6000 Factoring/GMU 19.68 mm 19.80 mm 51x Area of Xilinx Virtex II 6000 FPGA (estimation by R.J. Lim Fong, MS Thesis, VPI, 2004) 2.7 mm 2.82 mm Area of an ASIC with equivalent functionality ECE 448 – FPGA and ASIC Design with VHDL 5 ASICs vs. FPGAs Source: I. Kuon, J. Rose, University of Toronto “Measuring the Gap Between FPGAs and ASICs” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 62, no. 2, Feb 2007. ECE 448 – FPGA and ASIC Design with VHDL 6 ASICs vs. FPGAs 23 representative circuits implemented using FPGAs and ASICs - computer arithmetic (booth, cordic18, cordic8, etc.) - digital signal processing (rs_encoder, fir3, fir24, etc.) - communications (ethernet, mac1, atm, etc.) - cryptography (des_area, des_perf, aes, aes192, etc.) - scientific computations (molecular, raytracer, etc.) ECE 448 – FPGA and ASIC Design with VHDL 7 ECE 448 – FPGA and ASIC Design with VHDL 8 ECE 448 – FPGA and ASIC Design with VHDL 9 ECE 448 – FPGA and ASIC Design with VHDL 10 ECE 448 – FPGA and ASIC Design with VHDL 11 Simplified ASIC Design Flow Front-End Design Back-End Design Synthesis Timing Analysis Floorplanning Placement Clock Tree Synthesis Routing Design for Manufacturing ECE 448 – FPGA and ASIC Design with VHDL 12 31 Major ASIC Toolsets Cadence Magma ECE 448 – FPGA and ASIC Design with VHDL 13 Simplified ASIC Design Flow Front-End Design Back-End Design Synthesis Timing Analysis Synopsys Tools Design Compiler Primetime Floorplanning Placement Clock Tree Synthesis Astro Routing Design for Manufacturing ECE 448 – FPGA and ASIC Design with VHDL 14 31 A Complete Placed and Routed Chip IP ECE 448 – FPGA and ASIC Design with VHDL 15 28 What is “Physical Layout”? VDD VDD PMOS PMOS OUT IN IN OUT NMOS NMOS GND GND Transistor or Device View Physical or Layout View Physical Layout – Topography of devices and interconnects, made up of polygons that represent different layers of material (diffusion, polysilicon, metal, contact, etc) ECE 448 – FPGA and ASIC Design with VHDL 16 Process of Device Fabrication • Devices are fabricated vertically on a silicon substrate wafer by layering different materials in specific locations and shapes on top of each other • Each of many process masks defines the shapes and locations of a specific layer of material (diffusion, polysilicon, metal, contact, etc) • Mask shapes, derived from the layout view, are transformed to silicon via photolithographic and chemical processes Silicon Substrate Layout or Mask (aerial) view ECE 448 – FPGA and ASIC Design with VHDL Wafer (cross-sectional) view 17 40 Wafer Representation of Layout Polygons 0.25 um PMOS Input VDD Output GND NMOS Aerial or Layout View ECE 448 – FPGA and ASIC Design with VHDL Wafer Cross-sectional View 18 41 Front-End Design Flow ECE 448 – FPGA and ASIC Design with VHDL 19 Simplified RTL Synthesis Write RTL HDL Code HDL No Simulate OK Yes Synthesize RTL Code to Gates Gate Level Netlist No Constraints Met? Yes No Gate Level Testing OK? Yes Proceed with Backend Processing ECE 448 – FPGA and ASIC Design with VHDL 20 VHDL vs. Verilog Government Developed Commercially Developed Ada based C based Strongly Type Cast Mildly Type Cast Difficult to learn Easy to Learn More Powerful Less Powerful ECE 448 – FPGA and ASIC Design with VHDL 21 Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; ECE 448 – FPGA and ASIC Design with VHDL 22 Basic Synthesis Flow ECE 448 – FPGA and ASIC Design with VHDL 23 Synthesis using Design Compiler ECE 448 – FPGA and ASIC Design with VHDL 24 ECE 448 – FPGA and ASIC Design with VHDL 25 ECE 448 – FPGA and ASIC Design with VHDL 26 Script Language: TCL – Tool Command Language • Created by John Ousterhout of UC Berkeley • Scripting Language • Very simple to automate routine tasks. • Extension Language • Used to customize tools with user/company specific aplications. • Nearly all of modern EDA tools have a TCL interface. • Very simple to learn and use. ECE 448 – FPGA and ASIC Design with VHDL 27 TCL References • Practical Programming in Tcl and TK • Brent B. Welch • Ken Jones • TCL/TK in a Nutshell • Paul Raines • Jeff Tranter ECE 448 – FPGA and ASIC Design with VHDL 28 Synthesis script (1) designer = "Pawel Chodowiec" company = "George Mason University" search_path = "./opt3/synopsys/TSMCHOME/digital/Front_End/timing_power/tcb013ghp_200a " link_library = "* tcb013ghptc.db" /* Typical case library */ target_library = "tcb013ghptc.db " symbol_library = "tcb013ghp.sdb " /* Directory configuration */ src_directory = ~/exam1/vhdl/ report_directory = ~/exam1/reports/ db_directory = ~/exam1/db/ ECE 448 – FPGA and ASIC Design with VHDL 29 Synthesis script (2) /* Packages can be only read */ read_file -format vhdl -rtl src_directory + "components.vhd" blocks = {regne, upcount, RAM_16Xn_DISTRIBUTED, exam1} foreach (block, blocks) { block_source = src_directory + block + ".vhd" read_file -format vhdl -rtl block_source analyze -format vhdl -lib WORK block_source } current_design block /* All commands now apply to the entity "exam1" */ ECE 448 – FPGA and ASIC Design with VHDL 30 Synthesis script (3) uniquify /* Creates unique instances of multiple refrenced entities */ link check_design /* Checks the current design for consistency */ /*******************************************/ /* apply block attributes and constraints */ /*******************************************/ create_clock -period 10 clk /* Defines that the port "clk" on the entity "clk" is the clock for the design. Period=10ns 50% duty cycle Use -waveform option to define duty cycle other than 50%*/ set_operating_conditions NCCOM /*Normal Case Commercial Operating Conditions*/ ECE 448 – FPGA and ASIC Design with VHDL 31 Synthesis script (4) /***************************************************/ /* Apply these constraints to the top-level entity*/ /***************************************************/ set_max_fanout 100 block set_clock_latency 0.1 find(clock, "clk") set_clock_transition 0.01 find(clock, "clk") set_clock_uncertainty -setup 0.1 find(clock, "clk") set_clock_uncertainty -hold 0.1 find(clock, "clk") set_load 0 all_outputs() set_input_delay 1.0 -clock clk -max all_inputs() set_output_delay -max 1.0 -clock clk all_outputs() set_wire_load_model -library tcb013ghptc -name "TSMC8K_Fsg_Conservative" ECE 448 – FPGA and ASIC Design with VHDL 32 Wireload model basics (1) ECE 448 – FPGA and ASIC Design with VHDL 33 Wireload model basics (2) ECE 448 – FPGA and ASIC Design with VHDL 34 Synthesis script (5) set_dont_touch block compile -map_effort medium change_names -rules vhdl vhdlout_architecture_name = "sort_syn" vhdlout_use_packages = {"IEEE.std_logic_1164"} write -f db -hierarchy -output db_directory + "exam1.db" /*write -f vhdl -hierarchy -output db_directory + "exam1_syn.vhd"*/ report -area > report_directory + "exam1.report_area" report -timing -all > report_directory + "exam1.report_timing" ECE 448 – FPGA and ASIC Design with VHDL 35 Results of synthesis ECE 448 – FPGA and ASIC Design with VHDL 36 Area report after synthesis (1) report_area Information: Updating design information... (UID-85) **************************************** Report : area Design : exam1 Version: V-2003.12-SP1 Date: Tue Nov 15 20:39:06 2005 **************************************** Library(s) Used: tcb013ghptc (File: /opt3/synopsys/TSMCHOME/digital/Front_End/timing_power/ tcb013ghp_200a/tcb013ghptc.db) ECE 448 – FPGA and ASIC Design with VHDL 37 Area report after synthesis (2) Number of ports: Number of nets: Number of cells: Number of references: 75 346 107 28 Combinational area: 10593.477539 Noncombinational area: 14295.521484 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: Total area: 24888.976562 undefined ECE 448 – FPGA and ASIC Design with VHDL 38 Critical Path (1) • Critical Path – The Longest Path From Outputs of Registers to Inputs of Registers t logic in D Q D Q out clk tCritical = tFF-P + tlogic + tFF-setup ECE 448 – FPGA and ASIC Design with VHDL 39 Critical Path (2) • Min. Clock Period = Length of The Critical Path • Max. Clock Frequency = 1 / Min. Clock Period ECE 448 – FPGA and ASIC Design with VHDL 40 n+m n+m ECE 448 – FPGA and ASIC Design with VHDL 41 Clock Jitter • Rising Edge of The Clock Does Not Occur Precisely Periodically • May cause faults in the circuit clk ECE 448 – FPGA and ASIC Design with VHDL 42 Clock Skew • Rising Edge of the Clock Does Not Arrive at Clock Inputs of All Flip-flops at The Same Time in D Q clk in D Q out delay D Q D delay ECE 448 – FPGA and ASIC Design with VHDL Q out clk 43 Timing report after synthesis (1) **************************************** Report : timing -path full -delay max -max_paths 1 Design : exam1 Version: V-2003.12-SP1 Date : Tue Nov 15 20:39:06 2005 **************************************** Operating Conditions: NCCOM Library: tcb013ghptc Wire Load Model Mode: segmented ECE 448 – FPGA and ASIC Design with VHDL 44 Timing report after synthesis (2) Startpoint: in_addr(1) (input port clocked by clk) Endpoint: RegSUM/Q_reg[34] (rising edge-triggered flip-flop clocked by clk) Path Group: clk Path Type: max Des/Clust/Port Wire Load Model Library ----------------------------------------------------------------------------------exam1 TSMC8K_Fsg_Conservative tcb013ghptc RAM_16Xn_DISTRIBUTED ZeroWireload tcb013ghptc exam1_DW01_cmp2_32_0 ZeroWireload tcb013ghptc exam1_DW01_cmp2_32_1 ZeroWireload tcb013ghptc exam1_DW01_add_35_0 ZeroWireload tcb013ghptc regne_1 ZeroWireload tcb013ghptc regne_2 ZeroWireload tcb013ghptc regne_n35 ZeroWireload tcb013ghptc ECE 448 – FPGA and ASIC Design with VHDL 45 Timing report after synthesis (3) Point Incr Path -----------------------------------------------------------------------------------------------clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.10 0.10 input external delay 1.00 1.10 f in_addr(1) (in) 0.00 1.10 f U98/Z (CKMUX2D1) 0.13 1.23 f Memory/ADDR[1] (RAM_16Xn_DISTRIBUTED) 0.00 1.23 f Memory/U41/ZN (INVD1) 0.08 1.31 r Memory/U343/Z (OR3D1) 0.10 1.41 r Memory/U338/ZN (INVD2) 0.20 1.61 f Memory/U40/ZN (MOAI22D0) 0.17 1.78 f Memory/U350/Z (OR4D1) 0.26 2.03 f Memory/DATA_OUT[0] (RAM_16Xn_DISTRIBUTED) 0.00 2.03 f ECE 448 – FPGA and ASIC Design with VHDL 46 Timing report after synthesis (4) add_96xplusxplus/B[0] (exam1_DW01_add_35_0) add_96xplusxplus/U9/Z (AN2D0) add_96xplusxplus/U1_1/CO (CMPE32D1) add_96xplusxplus/U1_2/CO (CMPE32D1) add_96xplusxplus/U1_3/CO (CMPE32D1) add_96xplusxplus/U1_4/CO (CMPE32D1) add_96xplusxplus/U1_5/CO (CMPE32D1) add_96xplusxplus/U1_6/CO (CMPE32D1) add_96xplusxplus/U1_7/CO (CMPE32D1) add_96xplusxplus/U1_8/CO (CMPE32D1) add_96xplusxplus/U1_9/CO (CMPE32D1) add_96xplusxplus/U1_10/CO (CMPE32D1) add_96xplusxplus/U1_11/CO (CMPE32D1) add_96xplusxplus/U1_12/CO (CMPE32D1) add_96xplusxplus/U1_13/CO (CMPE32D1) add_96xplusxplus/U1_14/CO (CMPE32D1) ECE 448 – FPGA and ASIC Design with VHDL 0.00 0.12 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 2.03 f 2.15 f 2.25 f 2.34 f 2.44 f 2.54 f 2.63 f 2.73 f 2.82 f 2.92 f 3.02 f 3.11 f 3.21 f 3.31 f 3.40 f 3.50 f 47 Timing report after synthesis (5) add_96xplusxplus/U1_15/CO (CMPE32D1) add_96xplusxplus/U1_16/CO (CMPE32D1) add_96xplusxplus/U1_17/CO (CMPE32D1) add_96xplusxplus/U1_18/CO (CMPE32D1) add_96xplusxplus/U1_19/CO (CMPE32D1) add_96xplusxplus/U1_20/CO (CMPE32D1) add_96xplusxplus/U1_21/CO (CMPE32D1) add_96xplusxplus/U1_22/CO (CMPE32D1) add_96xplusxplus/U1_23/CO (CMPE32D1) add_96xplusxplus/U1_24/CO (CMPE32D1) add_96xplusxplus/U1_25/CO (CMPE32D1) add_96xplusxplus/U1_26/CO (CMPE32D1) add_96xplusxplus/U1_27/CO (CMPE32D1) add_96xplusxplus/U1_28/CO (CMPE32D1) add_96xplusxplus/U1_29/CO (CMPE32D1) add_96xplusxplus/U1_30/CO (CMPE32D1) add_96xplusxplus/U1_31/CO (CMPE32D1) ECE 448 – FPGA and ASIC Design with VHDL 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 3.60 f 3.69 f 3.79 f 3.88 f 3.98 f 4.08 f 4.17 f 4.27 f 4.37 f 4.46 f 4.56 f 4.66 f 4.75 f 4.85 f 4.94 f 5.04 f 5.14 f 48 Timing report after synthesis (6) add_96xplusxplus/U7/Z (AN2D0) 0.10 add_96xplusxplus/U5/Z (AN2D0) 0.08 add_96xplusxplus/U4/Z (CKXOR2D0) 0.15 add_96xplusxplus/SUM[34] (exam1_DW01_add_35_0) 0.00 RegSUM/R[34] (regne_n35) 0.00 RegSUM/U32/Z (AO21D0) 0.11 RegSUM/Q_reg[34]/D (EDFQD1) 0.00 data arrival time ECE 448 – FPGA and ASIC Design with VHDL 5.24 f 5.32 f 5.47 f 5.47 f 5.47 f 5.57 f 5.57 f 5.57 49 Timing report after synthesis (7) clock clk (rise edge) 10.00 10.00 clock network delay (ideal) 0.10 10.10 clock uncertainty -0.10 10.00 RegSUM/Q_reg[34]/CP (EDFQD1) 0.00 10.00 r library setup time -0.12 9.88 data required time 9.88 ------------------------------------------------------------------------------------data required time 9.88 data arrival time -5.57 ------------------------------------------------------------------------------------slack (MET) 4.31 ECE 448 – FPGA and ASIC Design with VHDL 50 Static Timing Analysis ECE 448 – FPGA and ASIC Design with VHDL 51 Static Timing Analysis Review • Tools will calculate all paths from sequential start point to sequential end point. • The worst case path will be used for Setup analysis, and the best case path will be used for hold analysis. • All paths are considered for design rule checking ECE 448 – FPGA and ASIC Design with VHDL 52 Review of Setup and Hold Checks ECE 448 – FPGA and ASIC Design with VHDL 53 False and Multicycle paths • False path • Very slow signals like reset, test mode enable, that are not used under normal conditions are classified as false paths • Multicycle path • Paths that take more than one clock cycle are known as multicycle paths. • You have to define the multicylce paths in the analyzer and the tool takes those constraints into account when synthesizing ECE 448 – FPGA and ASIC Design with VHDL 54 Multicycle path - Example ECE 448 – FPGA and ASIC Design with VHDL 55 Optimization criteria ECE 448 – FPGA and ASIC Design with VHDL 56 Degrees of freedom and possible trade-offs speed area power testability ECE 448 – FPGA and ASIC Design with VHDL 57 Degrees of freedom and possible trade-offs speed latency area throughput ECE 448 – FPGA and ASIC Design with VHDL 58 VHDL Coding for Synthesis ECE 448 – FPGA and ASIC Design with VHDL 59 Recommended rules for Synthesis • When implementing combinational paths do not use hierarchy • Register all outputs • Do not implement glue logic between blocks, partition them well • Separate designs on functional boundary • Keep block sizes to a reasonable size ECE 448 – FPGA and ASIC Design with VHDL 60 Avoid hierarchical combinational blocks Block A reg1 Block B Combinatorial Logic1 Combinatorial Logic2 Block C Combinatorial Logic3 reg2 Not recommended Design Practice The path between reg1 and reg2 is divided between three different block Due to hierarchical boundaries, optimization of the combinational logic cannot be achieved Synthesis tools (Synopsys) maintain the integrity of the I/O ports, combinational optimization cannot be achieved between blocks (unless “grouping” is used). ECE 448 – FPGA and ASIC Design with VHDL 61 Recommend way to handle Combinational Paths Block C Block A reg1 Combinatorial Logic1 & Logic2& Logic3 reg2 Recommended practice All the combinational circuitry is grouped in the same block that has its output connected the destination flip flop It allows the optimal minimization of the combinational logic during synthesis Allows simplified description of the timing interface ECE 448 – FPGA and ASIC Design with VHDL 62 Register all outputs Block Y Block Y Block X reg1 reg2 reg3 Register all outputs Simplifies the synthesis design environment: Inputs to the individual block arrive within the same relative delay (caused by wire delays) Don’t really need to specify output requirements since paths starts at flip flop outputs. Take care of fanouts, rule of thumb, keep the fanout to 16 (dependent on technology and components that are being driven by the output) ECE 448 – FPGA and ASIC Design with VHDL 63 NO GLUE LOGIC between blocks Top Block X Block Y reg1 reg3 No Glue Logic between Blocks, no matter what the temptation Due to time pressures, and a bug found that can be simply be fixed by adding some simple glue logic. RESIST THE TEMPTATION!!! At this level in the hierarchy, this implementation will not allow the glue logic to be absorbed within any lower level block. ECE 448 – FPGA and ASIC Design with VHDL 64 Separate design with different goals Top Time reg1 critical path Slow Logic reg3 ECE 448 – FPGA and ASIC Design with VHDL reg1 may be driven by time critical function, hence will have different optimization constraints reg3 may be driven by slow logic, hence no need to constrain it for speed 65 Optimization based on design requirements Top Speed optimized block Time reg1 critical path Area optimized block Slow Logic • Use different entities to partition design blocks • Allows different constraints during synthesis to optimize for area or speed or both. reg3 ECE 448 – FPGA and ASIC Design with VHDL 66 Separate FSM with random logic • Separation of the FSM and the random logic allows you to use FSM optimized synthesis Top Use FSM optimization tool FSM reg1 Standard optimization techniques used Random Logic ECE 448 – FPGA and ASIC Design with VHDL reg3 67 Maintain a reasonable block size • Partition your design such that each block is between 1000-10000 gates (this is strictly tools and technology dependent) • Larger the blocks, longer the run time -> quick iterations cannot be done. ECE 448 – FPGA and ASIC Design with VHDL 68