What is an “SoC”? • SoC = SOC = System on Chip = System on a Chip • Wider use: a Chip that implements a Complete System • More common use: a Chip with one or more CPU cores, Peripheral Interface Blocks, and Dedicated HW Blocks around a System Bus What is ASIC, FPGA, SoC? ASA ASIC AS ASA SoC AS ASA FPGA AS Individual Gates and Memory Pre-designed All Layout except Wires Pre-designed Layout Not Pre-designed Full-custom Whole Chip Pre-designed ASIC Gate-array (Structured ASIC) SoC FPGA Inside an FPGA CLB: Configurable Logic Block An example of an ARM based MCU SoC From a Designer’s Perspective • ASIC, FPGA, SoC: all the same from a designer’s point of view • We are in the SoC age => • Shop for IP blocks (IP block = Library block) • Integrate them with each other and your design What is ASIC? • IC • Full-custom IC • IC = SP or ASSP • SP = Standard Product = Memory chip, Processor • ASSP = Application Specific Standard Product = USB interface chip for ex. • ASIC => Think of Vestel or Cisco – an equipment=box=system maker that buys ICs (SP or ASSP) puts them on a PCB. They sometimes need extra logic => hence ASIC (Application Specific Integrated Circuit) Contemporary (wider) meaning of ASIC • Previous slide described the original (narrow) meaning of ASIC (how the word ASIC came about) • Such chips required quick methods for design because: • constraints in design time • constraints in design personnel • designs were not so aggressive • This resulted in what we call: ASIC Design Flow • Hence: an “ASIC Designer” doing “ASIC Design” may be working on an SP done in ASIC Design Flow as opposed to Full-Custom Flow. Why/when design your own chip or customize an SoC? As opposed to taking a CPU and writing code that runs on it BECAUSE: • CPU solution is not fast enough (FPGA is slower but offers more parallelism) • CPU is too expensive • CPU sucks too much power • CPU cannot meet the exact I/O timing requirements (no later no earlier) • CPU does not have the right number and mix of I/O pins • Form-factor: CPU is too big and/or requires a heat/sink, fan, and/or chip-set a LOOK at the SECTOR Top Semi Companies (2011) Fabless semi 1. Intel (USA): 2. Samsung (Korea): 3. TSMC (Taiwan): 4. TI (USA): 5. Toshiba (Japan): 6. Renesas (Japan): 7. Qualcomm (USA): 8. STMicro (Fr-Ita): 9. Hynix (Korea): 10. Micron (USA): Fab = Foundry $50B $29B/$260B+ $15B $14B $13B/$80B $11B $10B $10B $9B $7B 11. Broadcom (USA): 12. AMD (USA): 13. Infineon (Germany): 14. Sony (Japan): 15. Freescale (USA): 16. Elpida (Japan): 17. NXP (Holland): 18. UMC (Taiwan): 19. NVIDIA (USA): 20. Globalfoundries (USA): FPGA market size $7B $6B $5B $5B/$90B $4B $4B $4B $4B $4B $4B $5B Top FPGA (=PLD=CPLD) Companies (all with HQs in the USA) 1. 2. 3. 4. 5. Xilinx: Altera: Lattice: Microsemi (was Actel): Quicklogic: 49% 40% 6% 4% 1% DESIGN ISSUES ASIC Implementation Flow SW tools = $100K - $1M 3-12 months ASIC Design ~ 2 months Fabrication ~ 1 month Package/Test ~ 1 month Validation NRE = $100K - $4M ASIC FPGA NRE No NRE Lower unit cost in high volume Lower unit cost in low volume Faster Cheaper or free design tools Lower power Fast time to market Low barrier to entry Higher levels of integration More analog integration Programmable - Next few slides are Courtesy of Xilinx (DAC 2001) ASIC Design Flow Specification & Arch. spec (behav. code) Front-End Design HDL RTL Front-End Verification HDL RTL Synthesis/Timing Back-End Verification (Timing, GateSim, Formal, DRC, LVS) HDL gates Layout in GDSII Back-End Design ASIC Design Tool-set Editor Front-End Design HDL RTL Simulator SW Front-End Verification Stdcell Library HDL RTL Synthesis SW Back-End Verification (Timing, GateSim, Formal, DRC, LVS) Synthesis/Timing HDL gates Layout in GDSII Back-End Design Physical design, verif., DFT/ATPG SWs Top EDA Companies (all with HQs in the USA) 1. 2. 3. 4. Synopsys: Mentor Graphics: Cadence: Other: $1500M $900M $850M 27% (Above are my 2010 estimates. Total market size: $4.5B) FPGA Design Flow Specification & Arch. spec (behav. code) Front-End Design HDL RTL Front-End Verification HDL RTL Back-End Verification (Timing, GateSim, Formal, DRC, LVS) Synthesis, Back-end, Timing Bitfile FPGA Design Tool-set for Xilinx Xilinx ISE Editor, Simulator, Synthesis All in one IDE Front-End Design HDL RTL Front-End Verification HDL RTL Synthesis, Back-end, Timing Bitfile MODERN DIGITAL DESIGN - BASICS - You hardly need anything you learned in your Logic course in Modern (HDL and Synthesis based) Digital Design because: • We write code • We don’t design circuits • At least no gate-level circuits • We don’t care about theorems in Boolean Algebra • We don’t care about Karnaugh-maps • The synthesis SW (compiler) does the logic minimization for us • The FPGA has 1000s of gates anyway • (OK, in some extreme cases we may need to care) • Before we care about area minimization we need to care about meeting timing We write RTL code What is RTL code? What is the RTL programming paradigm? What does RTL mean in the first place? RTL = RT-Level = Register Transfer Level What is RT-Level digital (logic) design? Everything is a STATE MACHINE! Your (RTL) code describes the logic cloud storedVars Inputs Outputs storedVars_next Cloud of Logic (Combinational) more Flops Flop for ex. INCREMENTER clk INCREMENTER clk 0 0 0 0 0 0 1 0 INCREMENTER clk time 0 0 0 0 0 0 1 1 INCREMENTER clk time 0 0 0 0 1 0 0 1 INCREMENTER clk time 0 0 0 0 1 1 0 0 INCREMENTER clk time 0 0 0 0 1 1 1 0 INCREMENTER clk time 0 0 0 0 1 1 1 1 INCREMENTER clk time 0 0 1 0 0 1 0 1 INCREMENTER clk time Key points in this programming paradigm: • What are we programming? • How will we program? (Any guidelines?) • What is a “flop” by the way? Flop: What is it? Edge-Triggered D-Type Flip Flop = D-Type Flip Flop = Flip-Flop = Flop Edge-Triggered Flip-Flop as opposed to: Level-Sensitive Transparent Latch = Latch clk D Q D clk Flop = 1-bit DigiCam Q posedge posedge posedge posedge Flop: explained with WAVEFORMS 2 Flops back to back = Shift Register clk Q1 D Q2 D Q1 clk Q2 How a FLOP behaves (shown with a SHIFT REGISTER) t = before posedge clk 1 1 flop1 0 0 0 flop2 1 1 1 How a FLOP behaves (shown with a SHIFT REGISTER) t = posedge clk 1 1 flop1 1 0 0 flop2 0 1 1 How a FLOP behaves (shown with a SHIFT REGISTER) C2Q delay like good cholestrol t = posedge clk + C2Q delay 1 1 flop1 1 1 1 flop2 0 0 0 SWITCH = LATCH Latch = Transparent Latch clk clk (= enable) D Q SWITCH = LATCH Latch = Transparent Latch clk clk (= enable) D Q SWITCH = LATCH Latch = Transparent Latch 0 D 1 clk (= enable) Q FLOP = 2 back-to-back LATCHes clk clk1 flop clk2 NON-OVERLAPPING clk latch (master) C2Q delay clk1 ClockToQ (C2Q) delay latch (slave) clk2 Key points in this programming paradigm: • What are we programming? Your program DESCRIBES this clk Key points in this programming paradigm: • What are we programming? Your program DESCRIBES ONE CYCLE clk Key points in this programming paradigm: • How will we program? Any guidelines? That brings us to… VERILOG TUTORIAL - BASICS - example design: counter module counter(); endmodule counter example design: counter 4 counter module counter( cnt ); output [3:0] cnt; cnt endmodule example design: counter btn 1 4 counter cnt module counter( cnt, btn ); output [3:0] cnt; input btn; endmodule example design: counter module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; btn 1 4 counter clk cnt endmodule example design: counter module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; btn 1 always @(*) 4 4 cntNxt clk cnt endmodule example design: counter module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; btn 1 4 4 cntNxt clk cnt reg [3:0] cnt, cntNxt; always @(posedge clk) begin cnt <= #1 cntNxt; end endmodule example design: counter module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; btn 1 always @(*) 4 4 cntNxt clk cnt always @(posedge clk) begin cnt <= #1 cntNxt; end always @(*) begin if(btn) cntNxt = cnt +1; end endmodule example design: counter module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; 0 +1 4 4 1 cnt always @(posedge clk) begin cnt <= #1 cntNxt; end btn clk always @(*) begin if(btn) cntNxt = cnt +1; end endmodule example design: counter module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; 0 +1 4 4 1 btn clk cnt always @(posedge clk) begin cnt <= #1 cntNxt; end always @(*) begin cntNxt = cnt; if(btn) cntNxt = cnt +1; end endmodule cnt example design: counter prevBtn btn module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; reg prevBtn, posedgeBtn; always @(posedge clk) begin cnt <= #1 cntNxt; prevBtn <= #1 btn; end always @(*) begin cntNxt = cnt; posedgeBtn = ~prevBtn & btn; if(posedgeBtn) cntNxt = cnt +1; end endmodule cnt example design: counter prevBtn btn module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; reg prevBtn; always @(posedge clk) begin cnt <= #1 cntNxt; prevBtn <= #1 btn; end always @(*) begin cntNxt = cnt; if(~prevBtn & btn) cntNxt = cnt +1; end endmodule example design: counter cnt assign prevBtn btn always @(*) module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; reg prevBtn; wire posedgeBtn; always @(posedge clk) begin cnt <= #1 cntNxt; prevBtn <= #1 btn; end assign posedgeBtn = ~prevBtn & btn; always @(*) begin cntNxt = cnt; if(posedgeBtn) cntNxt = cnt +1; end endmodule cnt example design: counter btn posDet clk module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; reg prevBtn; wire posedgeBtn; always @(posedge clk) begin cnt <= #1 cntNxt; end posDet posDet(clk, btn, posedgeBtn); always @(*) begin cntNxt = cnt; if(posedgeBtn) cntNxt = cnt +1; end endmodule example design: counter cnt assign prevBtn btn always @(*) module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; reg prevBtn; wire posedgeBtn; always @(posedge clk) begin cnt <= #1 cntNxt; prevBtn <= #1 btn; end assign posedgeBtn = ~prevBtn & btn; always @(*) begin cntNxt = cnt; if(posedgeBtn) cntNxt = cnt +1; end endmodule example design: counter cnt assign prevBtn btn always @(*) module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; reg prevBtn; wire posedgeBtn; always @(posedge clk) begin cnt <= #1 cntNxt; prevBtn <= #1 btn; end always @(*) posedgeBtn = ~prevBtn & btn; always @(*) begin cntNxt = cnt; if(posedgeBtn) cntNxt = cnt +1; end endmodule cnt example design: counter prevBtn btn module counter(cnt, btn, clk); output [3:0] cnt; input btn, clk; reg [3:0] cnt, cntNxt; reg prevBtn, posedgeBtn; always @(posedge clk) begin cnt <= #1 cntNxt; prevBtn <= #1 btn; end always @(*) begin cntNxt = cnt; posedgeBtn = ~prevBtn & btn; if(posedgeBtn) cntNxt = cnt +1; end endmodule Expressing ALGORITHMs in RT-Level paradigm? 1. Think of your HW module as a netlist of HW submodules. 2. Each submodule can in turn be a netlist of subsubmodules. 3. Leaf modules can be expressed by behavior that can be synthesized: (what we call) RTL code. 4. RTL is how we express an algorithm in HW. 5. Break your algorithm into clock cycles. 6. You have to specify what is done in each cycle. Expressing ALGORITHMs in RT-Level paradigm? – cont’d 6. Think of it as a STATE MACHINE where every state is executed in a different cycle. 7. Store everything that needs top be remembered between states (= cycles) in explicitly coded REGISTERs. 8. Store also the STATE in an explicitly coded register. 9. At the top of the put a case(STATE). 10. What you will really code other than the registers is actually a Truth Table coded with a high-level language. 11. That is: Outputs depend on only inputs, which are external inputs plus register outputs. GOLDEN RULES L GOLDEN RULE 1 NO COMBINATIONAL LOOP always @(*) GOLDEN RULE 1 NO COMBINATIONAL LOOP always @(*) cntNxt always @(*) begin if(cntNxt) cntNxt = cnt –1; end GOLDEN RULE 1 NO COMBINATIONAL LOOP always @(*) always @(*) begin if(cnt) cntNxt = cnt –1; end GOLDEN RULE 1 NO COMBINATIONAL LOOP always @(*) always @(*) begin if(cnt) cntNxt = cnt –1; else cntNxt = cntNxt; end GOLDEN RULE 1 NO COMBINATIONAL LOOP always @(*) always @(*) begin if(cnt) cntNxt = cnt –1; else cntNxt = cnt; end GOLDEN RULE 1 – IMPLICATION Always have DEFAULT ASSIGNMENTS at the top of always @(*) always @(*) begin cntNxt = cnt; if(cnt) cntNxt = cnt –1; end GOLDEN RULE 1 – IMPLICATION Always have DEFAULT ASSIGNMENTS at the top of always @(*) always @(*) begin cntNxt = cnt; if(cntNxt) cntNxt = cnt –1; end GOLDEN RULE 2 NO INDIRECT COMBINATIONAL LOOPS always @(*) always @(*) always @(*) and assign are equivalent GOLDEN RULE 3 NO MULTIPLE DRIVERS always @(*) sameVar always @(*) sameVar GOLDEN RULE 3 NO MULTIPLE DRIVERS always @(*) begin cntNxt = cnt; if(btn1) cntNxt = cnt +1; end always @(*) begin cntNxt = cnt; if(btn2) cntNxt = cnt –1; end GOLDEN RULE 3 NO MULTIPLE DRIVERS // Merge in a single always always @(*) begin cntNxt = cnt; if(btn1) cntNxt = cnt +1; if(btn2) cntNxt = cnt –1; end GOLDEN RULE 3 NO MULTIPLE DRIVERS Extra input may be needed always @(*) var_v1 always @(*) always @(*) var_v2 var Arbiter (~~~ Priority Encoder) GOLDEN RULE 4 SINGLE CLOCK DOMAIN - unless really necessary - extra care needed for signals between different clock domains in clk’ = derived clk = divided clk = gated clk clk GOLDEN RULE 4 Do NOT Write Anything in always @pos blocks other than flop definitions i.e. Flop <= #1 FlopNxt GOLDEN RULE 5 SINGLE CLOCK DOMAIN - unless really necessary - extra care needed for signals between different clock domains 0 in 1 clk clk GOLDEN RULE 6 Do NOT Ignore Warning Messages other then the ones for #1’s. GOLDEN RULE 7 Write a Testbench and Simulate! It is well worth the time. HANDLING MULTIPLE CLOCKS • • Clocks with different frequencies Clocks with same frequency different phases between them. but HANDLING MULTIPLE CLOCKS Metastable state Stable 0 Stable 1 • • Setup Time and Hold Time violations Metastability Setup time D Clock Hold Time HANDLING MULTIPLE CLOCKS • • Clock nomenclature Design partitioning • • One module should work on one clock only A synchronizer module be made for all signals that cross from one clock domain to another Clk2_SigD Sync 2 to1 Clock1 logic Clk1_SigB Sync 1to 2 Clk2_SigC Clk1_SigA Clock1 domain Clock2 domain Clock2 logic HANDLING MULTIPLE CLOCKS • Transfer of Control Signals Src clock domain src_ctrl dest_ctrl dest_clk Two-stage synchronizer Dest clock domain HANDLING MULTIPLE CLOCKS • Transfer of DataSignals • Handshake signaling method xreq X clock domain xclk data Y clock domain yclk HANDLING MULTIPLE CLOCKS • Transfer of DataSignals • Asynchronous FIFO fifo_full X clock domain FIFO write xclk fifo_empty read Two-stage synchronizer Y clock domain yclk