Digital Design Chapter 5: Register-Transfer Level (RTL) Design Slides to accompany the textbook Digital Design, First Edition, by Frank Vahid, John Wiley and Sons Publishers, 2007. http://www.ddvahid.com Copyright © 2007 Frank Vahid Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities, subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf Digital Design with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means. Copyright © 2006 1 Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors Franksource Vahidor obtain special use permissions from Wiley – see http://www.ddvahid.com for information. may obtain PowerPoint 5.1 • Chapter 3: Controllers – Control input/output: single bit (or just a few) representing event or state – Finite-state machine describes behavior; implemented as state register and combinational logic bi bo Combinational logic n1 s1 FSM outputs FSM inputs Introduction n0 s0 State register clk • Chapter 4: Datapath components – Data input/output: Multiple bits collectively representing single entity – Datapath components included registers, adders, ALU, comparators, register files, etc. • This chapter: custom processors – Processor: Controller and datapath components working together to implement an algorithm Register si bi ansis ALU bo e z Combinational logic Comparator Register file Register file n1 n0 s1 s0 State register ALU Datapath Digital Design Copyright © 2006 Frank Vahid Controller Note: Slides with animation are denoted with a small red "a" near the animated items 2 RTL Design: Capture Behavior, Convert to Circuit • Recall – Chapter 2: Combinational Logic Design • First step: Capture behavior (using equation or truth table) • Remaining steps: Convert to circuit – Chapter 3: Sequential Logic Design Capture behavior • First step: Capture behavior (using FSM) • Remaining steps: Convert to circuit • RTL Design (the method for creating custom processors) Convert to circuit – First step: Capture behavior (using highlevel state machine, to be introduced) – Remaining steps: Convert to circuit Digital Design Copyright © 2006 Frank Vahid 3 RTL Design Method Digital Design Copyright © 2006 Frank Vahid 5.2 4 RTL Design Method: “Preview” Example • Soda dispenser – c: bit input, 1 when coin deposited c – a: 8-bit input having value of d deposited coin – s: 8-bit input having cost of a soda – d: bit output, processor sets to 1 when total value of deposited coins equals or 0 1 0 1 0 exceeds cost of a soda c d 0 1 0 Digital Design Copyright © 2006 Frank Vahid How can we precisely describe this processor’s behavior? s a Soda dispenser processor s 50 a 25 25 Soda tot: tot: dispenser 25 processor50 a 5 Preview Example: Step 1 -Capture High-Level State Machine • Declare local register tot • Init state: Set d=0, tot=0 • Wait state: wait for coin – If see coin, go to Add state • Add state: Update total value: tot = tot + a – Remember, a is present coin’s value – Go back to Wait state • In Wait state, if tot >= s, go to Disp(ense) state • Disp state: Set d=1 (dispense soda) – Return to Init state Digital Design Copyright © 2006 Frank Vahid s a 8 c d 8 Soda dispenser processor Inputs: c (bit), a (8 bits), s (8 bits) Outputs: d (bit) Local registers: tot (8 bits) c Add Init d=0 tot=0 Wait c’*(tot<s)’ tot=tot+a c’*(tot<s) Disp d=1 6 Preview Example: Step 2 -- Create Datapath • Need tot register • Need 8-bit comparator to compare s and tot • Need 8-bit adder to perform tot = tot + a • Wire the components as needed for above • Create control input/outputs, give them names c Add Init Wait d=0 t ot=0 c‘ (t ot<s)‘ tot= t ot+a c‘ *(tot<s) Disp d=1 s tot_ld a ld clr tot_clr 8 tot_lt_s Digital Design Copyright © 2006 Frank Vahid Inputs : c (bit), a(8 bits), s (8 bits) Outputs : d (bit) Local reg isters: t ot (8 bits) tot 8 8-bit < Datapath 8 8-bit adder 8 7 Preview Example: Step 3 – Connect Datapath to a Controller • Controller’s inputs s tot_ld – External input c (coin detected) – Input from datapath comparator’s output, which we named tot_lt_s a ld clr tot_clr 8 8 8-bit < tot_lt_s Datapath s tot 8 8-bit adder 8 a 8 8 • Controller’s outputs – External output d (dispense soda) – Outputs to datapath to load and clear the tot register c d tot_ld tot_clr Controller Digital Design Copyright © 2006 Frank Vahid tot_lt_s Datapath 8 Preview Example: Step 4 – Derive the Controller’s FSM Digital Design Copyright © 2006 Frank Vahid d tot_ld tot_clr Datapath c Controller • Same states and arcs as high-level state machine • But set/read datapath control signals for all c datapath operations d and conditions s a 8 8 tot_lt_s s Inputs::c, tot_lt_s(bit) Outputs:d, tot_ld, tot_clr (bit) tot_ld tot_ld tot_clr c Add Init d=0 tot_clr=1 Wait tot_ld=1 c’*tot_lt_s a ld tpt clr 8 8 tot_clr tot_lt_s tot_lt_s 8-bit < Datapath 8 8-bit adder 8 Disp Controller d=1 9 Preview Example: Completing the Design n0 d tot_clr 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 Add 0 1 1 0 0 1 1 1 1 1 0 Disp 1 1 0 0 0 0 1 0 0 Init n1 tot_ld – As in Ch3 – Table shown on right tot_lt_s • Implement the FSM as a state register and logic tot_ld c d c Init Wait d=0 tot_clr=1 Add tot_clr tot_ld=1 tot_lt_s c’*tot_lt_s Disp Controller Digital Design Copyright © 2006 Frank Vahid Wait Inputs::c, tot_lt_s(bit) Outputs:d, tot_ld, tot_clr (bit) s1 s0 c 0 0 0 0 0 0 0 1 0 0 d=1 10 Step 1: Create a High-Level State Machine • Let’s consider each step of the RTL design process in more detail Inputs: c (bit), a (8 bits), s (8 bits) Outputs: d (bit) • Step 1 – Soda dispenser example – Not an FSM because: • Multi-bit (data) inputs a and s • Local register tot • Data operations tot=0, tot<s, tot=tot+a. – Useful high-level state machine: • Data types beyond just bits • Local registers • Arithmetic equations/expressions Digital Design Copyright © 2006 Frank Vahid Local registers: tot (8 bits) c Init d=0 tot=0 Wait c’(tot<s)’ tot= tot+a c’ (tot<s) Disp d=1 11 Step 1 Example: Laser-Based Distance Measurer T (in seconds) laser D Object of interest sensor 2D = T sec * 3*108 m/sec • Example of how to create a high-level state machine to describe desired processor behavior • Laser-based distance measurement – pulse laser, measure time T to sense reflection – Laser light travels at speed of light, 3*108 m/sec – Distance is thus D = T sec * 3*108 m/sec / 2 Digital Design Copyright © 2006 Frank Vahid 12 Step 1 Example: Laser-Based Distance Measurer T (in seconds) laser sensor from button B D to display L 16 Laser-based distance measurer to laser S from sensor • Inputs/outputs – – – – B: bit input, from button to begin measurement L: bit output, activates laser S: bit input, senses laser reflection D: 16-bit output, displays computed distance Digital Design Copyright © 2006 Frank Vahid 13 Step 1 Example: Laser-Based Distance Measurer from button B L Inputs: B, S(1 bit each) Outputs: L (bit), D (16 bits) to display S0 a D 16 Laserbased distance measurer S to laser from sensor ? L = 0 (laser off) D = 0 (distance = 0) • Step 1: Create high-level state machine • Begin by declaring inputs and outputs • Create initial state, name it S0 – Initialize laser to off (L=0) – Initialize displayed distance to 0 (D=0) Digital Design Copyright © 2006 Frank Vahid 14 Step 1 Example: Laser-Based Distance Measurer from button B Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) B’ (button not pressed) to display D L 16 Laserbased distance measurer S to laser from sensor a S0 ? S1 L=0 D=0 B (button pressed) • Add another state, call S1, that waits for a button press – B’ – stay in S1, keep waiting – B – go to a new state S2 Q: What should S2 do? Digital Design Copyright © 2006 Frank Vahid A: Turn on the laser a 15 Step 1 Example: Laser-Based Distance Measurer from button B Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) to display D L Laserbased distance measurer 16 S to laser from sensor B’ S0 L=0 D=0 S1 B S2 S3 L=1 (laser on) L=0 (laser off) a • Add a state S2 that turns on the laser (L=1) • Then turn off laser (L=0) in a state S3 Q: What do next? A: Start timer, wait to sense reflection a Digital Design Copyright © 2006 Frank Vahid 16 Step 1 Example: Laser-Based Distance Measurer Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) from button Local Registers: Dctr (16 bits) to display B’ B D L 16 Lase r-based distance measu rer S to laser from sensor S’ (no reflection) S0 S1 L=0 D=0 Dctr = 0 (reset cycle count) B S2 L=1 S3 S (reflection) ? L=0 Dctr = Dctr + 1 (count cycles) a • Stay in S3 until sense reflection (S) • To measure time, count cycles for which we are in S3 – To count, declare local register Dctr – Increment Dctr each cycle in S3 – Initialize Dctr to 0 in S1. S2 would have been O.K. too Digital Design Copyright © 2006 Frank Vahid 17 Step 1 Example: Laser-Based Distance Measurer from button Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) to display B D L 16 Lase r-based distance measu rer S to laser from sensor S’ B’ a S0 S1 L=0 D=0 Dctr = 0 B S2 L=1 S3 S S4 L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D) • Once reflection detected (S), go to new state S4 – Calculate distance – Assuming clock frequency is 3x108, Dctr holds number of meters, so D=Dctr/2 • After S4, go back to S1 to wait for button again Digital Design Copyright © 2006 Frank Vahid 18 Step 2: Create a Datapath • Datapath must – Implement data storage – Implement data computations • Look at high-level state machine, do three substeps – (a) Make data inputs/outputs be datapath inputs/outputs – (b) Instantiate declared registers into the datapath (also instantiate a register for each data output) – (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations Digital Design Copyright © 2006 Frank Vahid Instantiate: to introduce a new component into a design. 19 Step 2 Example: Laser-Based Distance Measurer Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) (a) Make data Local Registers: Dctr (16 bits) inputs/outputs be datapath B‘ S‘ inputs/outputs (b) Instantiate declared S4 S0 S1 S2 S3 registers into the B S datapath (also L=0 Dctr = 0 L=1 L=0 D = Dctr / 2 instantiate a D=0 Dctr = Dctr + 1 (calculate D) register for each a data output) Datapath (c) Examine every Dreg_clr state and Dreg_ld transition, and clear clear I Dctr_clr instantiate Dctr: 16-bit Dreg: 16-bit count Dctr_cnt load up-counter register datapath Q Q components and connections to implement any 16 data computations D Digital Design Copyright © 2006 Frank Vahid 20 Step 2 Example: Laser-Based Distance Measurer (c) (continued) Examine every state and transition, and instantiate datapath components and connections to implement any data computations Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B‘ S‘ S0 S1 L=0 D=0 Dctr = 0 B S2 L=1 S3 S S4 L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D) a Datapath >>1 16 Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count clear Dctr: 16-bit up-counter I load Q Dreg: 16-bit register Q 16 16 D Digital Design Copyright © 2006 Frank Vahid 21 Step 2 Example Showing Mux Use Localregisters: E,F, G, R (16 bits) E F G E F G T0 R = E + F A + B A + T1 R = R + G add_A_s0 add_B_s0 F 1 2× A a (a) B E R R (b) (c) G 1 2× + B R (d) • Introduce mux when one component input can come from more than one source Digital Design Copyright © 2006 Frank Vahid 22 Step 3: Connecting the Datapath to a Controller from button L B Controller from sensor S Dreg_clr Dreg_ld Dctr_clr Datapath Dctr_cnt D to display 16 to laser 300 MHz Clock • Laser-based distance measurer example • Easy – just connect all control signals between controller and datapath Datapath >>1 16 Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count Q Digital Design Copyright © 2006 Frank Vahid clear load Dctr: 16-bit up-counter 16 I Dreg: 16-bit register Q 16 D 23 Step 4: Deriving the Controller’s FSM from butt on Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) L B to laser Controller from sensor S Dreg_clr B’ Dreg_ld Dctr_clr Datapath Dctr_cnt D to display 16 S’ 300 MHz Clock S0 S1 L=0 D=0 Dctr = 0 S2 B L=1 Inputs: B, S • FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt structure as highB’ level state machine – Inputs/outputs all bits now – Replace data operations by bit operations using datapath Digital Design Copyright © 2006 Frank Vahid S3 S S4 L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D) S’ a S0 S1 L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg) L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count) B S S2 S3 L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on) L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up) S4 L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)24 Step 4: Deriving the Controller’s FSM B’ • Using shorthand of outputs not assigned implicitly assigned 0 S’ S0 S1 L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg) L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count) Inputs: B, S B S S2 S3 L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on) L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up) S4 L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting) Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt B’ S’ a S0 S1 L=0 Dreg_clr = 1 (laser off) (clear D reg) Dctr_clr = 1 (clear count) Digital Design Copyright © 2006 Frank Vahid B S S2 S3 L=1 (laser on) L=0 Dctr_cnt = 1 (laser off) (count up) S4 Dreg_ld = 1 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting) 25 Step 4 L Dreg_clr Dreg_ld Dctr_clr Datapath B Controller from button S to laser from sensor >>1 16 Dreg_clr Dreg_ld clear count Dctr_clr Dctr_cnt Dctr_cnt to display Datapath D 16 Q 300 MHz Clock clear load Dctr: 16-bit up-counter 16 I Dreg: 16-bit register Q 16 D Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt B’ S0 S1 L=0 Dreg_clr = 1 (laser off) (clear D reg) Dctr_clr = 1 (clear count) Digital Design Copyright © 2006 Frank Vahid S’ B S S2 S3 L=1 (laser on) L=0 Dctr_cnt = 1 (laser off) (count up) • Implement S4 FSM as state register and Dreg_ld = 1 Dctr_cnt = 0 logic (Ch3) to (load D reg with Dctr/2) complete the (stop counting) design 26 5.3 RTL Design Examples and Issues • We’ll use several more examples to illustrate RTL design • Example: Bus interface – Master processor can read register from any peripheral • Each register has unique 4-bit address • Assume 1 register/periph. – Sets rd=1, A=address – Appropriate peripheral places register data on 32-bit D lines • Periph’s address provided on Faddr inputs (maybe from DIP switches, or another register) Digital Design Copyright © 2006 Frank Vahid Master processor rd Per0 32 D 4 A Per1 Per15 to/from processor bus rd D A 32 4 Faddr Bus interface 4 Q 32 Main part Peripheral 27 RTL Example: Bus Interface Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ ((A = Faddr) and rd’) WaitMyAddress (A = Faddr) and rd D = “Z” Q1 = Q rd SendData D = Q1 • Step 1: Create high-level state machine – State WaitMyAddress • Output “nothing” (“Z”) on D, store peripheral’s register value Q into local register Q1 • Wait until this peripheral’s address is seen (A=Faddr) and rd=1 – State SendData • Output Q1 onto D, wait for rd=0 (meaning main processor is done reading the D lines) Digital Design Copyright © 2006 Frank Vahid 28 RTL Example: Bus Interface Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ ((A = Faddr) and rd’) WaitMyAddress (A = Faddr) and rd D = “Z” Q1 = Q Digital Design Copyright © 2006 Frank Vahid rd SendData D = Q1 29 RTL Example: Bus Interface Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ ((A = Faddr) and rd)’ SendData WaitMyAddress (A = Faddr) D = Q1 and rd D = “Z” Q1 = Q A rd 4 Faddr Q 4 32 Q1_ld ld Q1 = (4-bit) A_eq_Faddr D_en • Step 2: Create a datapath (a) Datapath inputs/outputs (b) Instantiate declared registers (c) Instantiate datapath components and connections Digital Design Copyright © 2006 Frank Vahid 32 32 a Datapath Bus interface D 30 RTL Example: Bus Interface Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ Inputs: rd, A_eq_Faddr ((A =(bit) Faddr) Outputs: Q1_ld, D_enand (bit)rd)’ ‘ rdSendData rd WaitMyAddress (A = Faddr) D = Q1 and(A_eq_Faddr rd D = “Z” and rd)‘ Q1 = Q SendData WaitMyAddress a D_en = 0 Q1_ld = 1 A_eq_Faddr and rd A rd 4 Q 4 32 Q1_ld rd ld = (4-bit) A_eq_Faddr D_en = 1 Q1_ld = 0 D_en Bus interface • Step 3: Connect datapath to controller • Step 4: Derive controller’s FSM Digital Design Copyright © 2006 Frank Vahid Faddr Q1 32 32 Datapath D 31 RTL Example: Video Compression – Sum of Absolute Only difference: ball moving Differences Frame 1 Frame 2 Frame 1 Frame 2 Digitized Digitized Digitized Difference of frame 1 frame 2 frame 1 2 from 1 1 Mbyte 1 Mbyte 1 Mbyte 0.01 Mbyte (a) (b) • Video is a series of frames (e.g., 30 per second) • Most frames similar to previous frame a Just send difference – Compression idea: just send difference from previous frame Digital Design Copyright © 2006 Frank Vahid 32 RTL Example: Video Compression – Sum of Absolute Differences compare Frame 1 Frame 2 Each is a pixel, assume represented as 1 byte (actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel) • Need to quickly determine whether two frames are similar enough to just send difference for second frame – Compare corresponding 16x16 “blocks” • Treat 16x16 block as 256-byte array – Compute the absolute value of the difference of each array item – Sum those differences – if above a threshold, send complete frame for second frame; if below, can use difference method (using another technique, not described) Digital Design Copyright © 2006 Frank Vahid 33 RTL Example: Video Compression – Sum of Absolute Differences 256-byte array A 256-byte array B SAD sad integer go !(i<256) • Want fast sum-of-absolute-differences (SAD) component – When go=1, sums the differences of element pairs in arrays A and B, outputs that sum Digital Design Copyright © 2006 Frank Vahid 34 RTL Example: Video Compression – Sum of Absolute Differences SAD A sad B Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) go • • • • S0: wait for go S1: initialize sum and index S2: check if done (i>=256) S3: add difference to sum, increment index • S4: done, write to output sad_reg Digital Design Copyright © 2006 Frank Vahid S0 go S1 (i<256)’ !(i<256) !go sum = 0 i=0 a S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 S4 sad_reg = sum 35 RTL Example: Video Compression – Sum of Absolute Differences AB_addr Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) S0 i_lt_256 go S1 (i<256)’ sum = 0 i=0 S2 <256 i_clr 8 sum_ld sum_clr Datapath • Step 2: Create datapath Digital Design Copyright © 2006 Frank Vahid 8 – i a i<256 !(i<256) sum=sum+abs(A[i]-B[i]) sad_reg_ld S3 i=i+1 !(i<256) sad_reg=sum S4(i_lt_256) 8 9 i_inc !go A_data B_data sum 32 abs 8 32 32 sad_reg + 32 sad 36 RTL Example: Video Compression – Sum of Absolute Differences go AB_addr AB_rd i_lt_256 go’ S0 go S1 ? i_clr 8 9 8 – i 8 sum_ld S2 i<256 i_lt_256 sum=sum+abs(A[i]-B[i]) S3 sum_ld=1; AB_rd=1 i=i+1 i_inc=1 S4 <256 i_inc sum=0 sum_clr=1 i=0 i_clr=1 A_data B_data sad_reg=sum sad_reg_ld=1 (i_lt_256) a !(i<256) !(i<256) (i_lt_256) Controller sum_clr !(i<256) sum 32 abs 8 32 32 sad_reg_ld sad_reg + 32 sad • Step 3: Connect to controller • Step 4: Replace high-level state machine by FSM Digital Design Copyright © 2006 Frank Vahid 37 RTL Example: Video Compression – Sum of Absolute Differences • Comparing software and custom circuit SAD – Circuit: Two states (S2 & S3) for each i, 256 i’s 512 clock cycles – Software: Loop (for i = 1 to 256), but for each i, must move memory to local registers, subtract, compute absolute value, add to sum, increment i – say about 6 cycles per !(i<256) array item 256*6 = 1536 cycles – Circuit is about 3 times (300%) faster (i_lt_256) !(i<256) – Later, we’ll see how to build SAD circuit that is even faster Digital Design Copyright © 2006 Frank Vahid (i<256)’ S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 38 RTL Design Pitfalls and Good Practice • Common pitfall: Assuming register is update in the state it’s written – Final value of Q? – Final state? – Answers may surprise you • Value of Q unknown • Final state is C, not D – Why? • State A: R=99 and Q=R happen simultaneously • State B: R not updated with R+1 until next clock cycle, simultaneously with state register being updated Digital Design Copyright © 2006 Frank Vahid 39 RTL Design Pitfalls and Good Practice • Solutions – Read register in following state (Q=R) – Insert extra state so that conditions use updated value – Other solutions are possible, depends on the example Digital Design Copyright © 2006 Frank Vahid 40 RTL Design Pitfalls and Good Practice • Common pitfall: Reading outputs – Outputs can only be written – Solution: Introduce additional register, which can be written and read Inputs: A, B (8 bits) Outputs: P (8 bits) S T S T P=A P=P+B R=A P=A P=R+B (a) Digital Design Copyright © 2006 Frank Vahid Inputs: A, B (8 bits) Outputs: P (8 bits) Local register: R (8 bits) (b) 41 RTL Design Pitfalls and Good Practice • Good practice: Register all data outputs B R B R – In fig (a), output P would show spurious values as addition computes • Furthermore, longest register-to-register path, which determines clock period, is not known until that output is connected to another component – In fig (b), spurious outputs reduced, and longest register-to-register path is clear Digital Design Copyright © 2006 Frank Vahid + + P (a) Preg P (b) 42 Control vs. Data Dominated RTL Design • Designs often categorized as control-dominated or datadominated – Control-dominated design – Controller contains most of the complexity – Data-dominated design – Datapath contains most of the complexity – General, descriptive terms – no hard rule that separates the two types of designs – Laser-based distance measurer – control dominated – Bus interface, SAD circuit – mix of control and data – Now let’s do a data dominated design Digital Design Copyright © 2006 Frank Vahid 43 Data Dominated RTL Design Example: FIR Filter • Filter concept – Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle) – That 240 is probably wrong! • Could be electrical noise – Filter should remove such noise in its output Y – Simple filter: Output average of last N values Y X 12 digital filter 12 clk • Small N: less filtering • Large N: more filtering, but less sharp output Digital Design Copyright © 2006 Frank Vahid 44 Data Dominated RTL Design Example: FIR Filter • FIR filter – “Finite Impulse Response” – Simply a configurable weighted sum of past input values – y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) • Above known as “3 tap” • Tens of taps more common • Very general filter – User sets the constants (c0, c1, c2) to define specific filter Y X 12 digital filter 12 clk y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) – RTL design • Step 1: Create high-level state machine – But there really is none! Data dominated indeed. • Go straight to step 2 Digital Design Copyright © 2006 Frank Vahid 45 Data Dominated RTL Design Example: FIR Filter • Step 2: Create datapath – Begin by creating chain of xt registers to hold past values of X Y X 12 digital filter 12 clk y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) Suppose sequence is: 180, 181, 240 240 180 181 Digital Design Copyright © 2006 Frank Vahid 180 181 180 a 46 Data Dominated RTL Design Example: FIR Filter • Step 2: Create datapath (cont.) – Instantiate registers for c0, c1, c2 – Instantiate multipliers to compute c*x values x(t) c0 xt0 Y X 12 12 digital filter clk y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) 3-tap FIR filter x(t-1) c1 xt1 x(t-2) c2 xt2 X a clk * * * Y Digital Design Copyright © 2006 Frank Vahid 47 Data Dominated RTL Design Example: FIR Filter • Step 2: Create datapath (cont.) Y X 12 digital filter 12 clk – Instantiate adders y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) 3-tap FIR filter x(t) c0 xt0 x(t-1) c1 xt1 x(t-2) c2 xt2 X clk * * + a * + Y Digital Design Copyright © 2006 Frank Vahid 48 Data Dominated RTL Design Example: FIR Filter • Step 2: Create datapath (cont.) Y X 12 – Add circuitry to allow loading of particular c register digital filter 12 clk y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) CL 3-tap FIR filter e Ca1 Ca0 3 2x4 2 1 0 C x(t) c0 xt0 x(t-1) c1 xt1 x(t-2) c2 xt2 a X clk * * + * + yreg Y Digital Design Copyright © 2006 Frank Vahid 49 Data Dominated RTL Design Example: FIR Filter • Step 3 & 4: Connect to controller, Create FSM y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) – No controller needed – Extreme data-dominated example – (Example of an extreme control-dominated design – an FSM, with no datapath) • Comparing the FIR circuit to a software implementation – Circuit • Assume adder has 2-gate delay, multiplier has 20-gate delay • Longest past goes through one multiplier and two adders – 20 + 2 + 2 = 24-gate delay • 100-tap filter, following design on previous slide, would have about a 34-gate delay: 1 multiplier and 7 adders on longest path – Software • 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction. • (100*2 + 100*2)*10 = 4000 gate delays – Circuit is more than 100 times faster (10,000% faster). Wow. Digital Design Copyright © 2006 Frank Vahid 50 5.4 Determining Clock Frequency • Designers of digital circuits often want fastest performance clk a b – Means want high clock frequency • Frequency limited by longest register-to-register delay – Known as critical path – If clock is any faster, incorrect data may be stored into register – Longest path on right is 2 ns 2 ns delay + c • Ignoring wire delays, and register setup and hold times, for simplicity Digital Design Copyright © 2006 Frank Vahid 51 Critical Path • Example shows four paths 2 ns delay b + * 7 ns – 1 / 7 ns = 142 MHz Max c 5 ns delay 7 ns 7 ns 7 ns • Longest path is thus 7 ns • Fastest frequency a 5 ns a to c through +: 2 ns a to d through + and *: 7 ns b to d through + and *: 7 ns b to d through *: 5 ns 2 ns – – – – d (2,7,7,5) = 7 ns Digital Design Copyright © 2006 Frank Vahid 52 Critical Path Considering Wire Delays • Real wires have delay too – Must include in critical path • Example shows two paths – 1980s/1990s: Wire delays were tiny compared to logic delays – But wire delays not shrinking as fast as logic delays • Wire delays may even be greater than logic delays! • Must also consider register setup and hold times, also add to path • Then add some time to the computed path, just to be safe a b 0.5 ns 0.5 ns + 2 ns 0.5 ns 3 ns • Trend clk 3 ns – Each is 0.5 + 2 + 0.5 = 3 ns c 3 ns – e.g., if path is 3 ns, say 4 ns instead Digital Design Copyright © 2006 Frank Vahid 53 A Circuit May Have Numerous Paths • Paths can exist – In the datapath – In the controller – Between the controller and datapath – May be hundreds or thousands of paths • Timing analysis tools that evaluate all possible paths automatically very helpful Digital Design Copyright © 2006 Frank Vahid s Combinational logic a 8 8 d tot_ld ld tot t ot_clr c clr 8 (c) tot_lt_s n1 8-bit < n0 8-bit adder 8 tot_lt_s Datapath s1 clk s0 (b) (a) State register 54 Behavioral Level Design: C to Gates 5.5 C code !go S0 go S1 (i<256)’ sum = 0 i=0 S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 S4 sad_reg = sum a int SAD (byte A[256], byte B[256]) // not quite C syntax { uint sum; short uint I; sum = 0; i = 0; while (i < 256) { sum = sum + abs(A[i] – B[i]); i = i + 1; } return sum; } • Earlier sum-of-absolute-differences example – Started with high-level state machine – C code is an even better starting point -- easier to understand Digital Design Copyright © 2006 Frank Vahid 55 Behavioral-Level Design: Start with C (or Similar Language) • Replace first step of RTL design method by two steps – Capture in C, then convert C to high-level state machine – How convert from C to high-level state machine? Step 1A: Capture in C a Step 1B: Convert to high-level state machine Digital Design Copyright © 2006 Frank Vahid 56 Converting from C to High-Level State Machine • Convert each C construct to equivalent states and transitions • Assignment statement – Becomes one state with assignment target= expression target = expression; a • If-then statement – Becomes state with condition check, transitioning to “then” statements if condition true, otherwise to ending state • “then” statements would also be converted to states Digital Design Copyright © 2006 Frank Vahid !cond if (cond) { // then stmts } cond a (then stmts) (end) 57 Converting from C to High-Level State Machine • If-then-else – Becomes state with condition check, transitioning to “then” statements if condition true, or to “else” statements if condition false !cond if (cond) { // then stmts } else { // else stmts } cond (then stmts) (else stmts) a (end) • While loop statement – Becomes state with condition check, transitioning to while loop’s statements if true, then transitioning back to condition check Digital Design Copyright © 2006 Frank Vahid !cond while (cond) { // while stmts } cond (while stmts) a (end) 58 Simple Example of Converting from C to HighLevel State Machine Inputs: uint X, Y Outputs: uint Max !(X>Y) !(X>Y) X>Y X>Y if (X > Y) { Max = X; (then stmts) (else stmts) Max=X Max=Y } else { Max = Y; (end) (end) } a a (a) (b) (c) • Simple example: Computing the maximum of two numbers – Convert if-then-else statement to states (b) – Then convert assignment statements to states (c) Digital Design Copyright © 2006 Frank Vahid 59 Example: Converting Sum-of-Absolute-Differences C code to High-Level State Machine • Convert each construct to states – Simplify when possible, e.g., merge states • From high-level state machine, follow RTL design method to create circuit • Thus, can convert C to gates using straightforward automatable process – Not all C constructs can be efficiently converted – Use C subset if intended for circuit – Can use languages other than C, of course Inputs: byte A[256, B[256] bit go; Output: int sad main() { uint sum; short uint I; while (1) { !(!go) !go !go sad = sum; !go (d) (c) (b) while (i < 256) { sum = sum + abs(A[i] - B[i]); i = i + 1; } go sum=0 i=0 i=0 sum = 0; i = 0; } !go sum=0 while (!go); } go go !go go a (a) !go sum=0 i=0 go sum=0 i=0 sum=0 i=0 !(i<256) i<256 !(i<256) i<256 sum=sum + abs i=i+1 !(i<256) i<256 sum=sum + abs i=i+1 while stmts sad = sum (g) Digital Design Copyright © 2006 Frank Vahid (e) sad = sum (f) 60 4.10 Register Files • MxN register file component provides er C C efficient access to M N- er t s ompu t ? bit-wide registers 32 8 d0d0 4 162 4 cthe car om ompu – If we have many al com rt the car's r d1 registers but only need al r F 4 a0 e tn n c r i0 access one or two at a F e c i1 time, a register file is i3-i0 a1 d2 more efficient – Ex: Above-mirror display (earlier example), but this d3 d15 e time having 16 32-bit e load registers load • Too many wires, and big mux is too slow Digital Design Copyright © 2006 Frank Vahid a load reg0 load reg0 too much fanout load reg1 huge mux T 32 8 A 8 load reg2 I 32 8 load reg3 load reg15 o th or mi dis T mi T r rorod 32-bit 8-bit r 16x41×1 r o i1 ve a o y a ve d d y - DD 8 i0 i0 i2 congestion M 32 8 i15i3 s1 s0 s3-s0 x y 61 Register File • Instead, want component that has one data input and one data output, and allows us to specify which internal register to write and which to read 32 4 32 W_data R_data W_addr R_addr W_en 16×32 register file a 4 R_en a Digital Design Copyright © 2006 Frank Vahid 62 Register File Timing Diagram • Can write one register and read one register each clock cycle – May be same register 32 32 W_data R_data W_addr R_addr 2 2 W_en R_en 4x32 register file Digital Design Copyright © 2006 Frank Vahid 63 5.6 • Register-transfer level design instantiates datapath components to create datapath, controlled by a controller – A few more components are often used outside the controller and datapath M words Memory Components • MxN memory – M words, N bits wide each • Several varieties of memory, which we now introduce Digital Design Copyright © 2006 Frank Vahid N-bits wide each M×N memory 64 Random Access Memory (RAM) • RAM – Readable and writable memory – “Random access memory” • Strange name – Created several decades ago to contrast with sequentially-accessed storage like tape drives – Logically same as register file – Memory with address inputs, data inputs/outputs, and control 32 4 32 W_data R_data W_addr R_addr W_en 16×32 register file 4 R_en Register file from Chpt. 4 • RAM usually just one port; register file usually two or more – RAM vs. register file • RAM typically larger than roughly 512 or 1024 words • RAM typically stores bits using a bit storage approach that is more efficient than a flip flop • RAM typically implemented on a chip in a square rather than rectangular shape – keeps longest wires (hence delay) short Digital Design Copyright © 2006 Frank Vahid 32 data 10 addr rw 1024× 32 RAM en RAM block symbol 65 RAM Internal Structure 32 10 data rw wdata(N-1) wdata(N-2) wdata0 Let A = log2M addr 1024x32 RAM en d0 addr0 addr1 addr(A-1) addr clk en rw word enable a0 a1 AxM d1 decoder a(A-1) e bit storage block (aka “cell”) word data cell word word enable enable rw data d(M-1) to all cells rdata(N-1) rdata(N-2) rdata0 RAM cell • Similar internal structure as register file Digital Design Copyright © 2006 Frank Vahid – Decoder enables appropriate word based on address inputs – rw controls whether cell is written or read – Let’s see what’s inside each RAM cell 66 Static RAM (SRAM) SRAM cell 32 10 addr rw data’ data data cell d 1024x32 RAM d’ en a 0 word enable • “Static” RAM cell SRAM cell – 6 transistors (recall inverter is 2 transistors) – Writing this cell data’ 0 data 1 d • word enable input comes from decoder • When 0, value d loops around inverters a 1 0 – That loop is where a bit stays stored • When 1, the data bit value enters the loop – data is the bit to be stored in this cell – data’ enters on other side – Example shows a “1” being written into cell data 1 Digital Design Copyright © 2006 Frank Vahid 1 word enable word enable d 0 data’ cell d’ 0 a 67 Static RAM (SRAM) 32 10 data addr rw 1024x32 RAM en • “Static” RAM cell SRAM cell • Somewhat trickier • When rw set to read, the RAM logic sets both data and data’ to 1 • The stored bit d will pull either the left line or the right bit down slightly below 1 • “Sense amplifiers” detect which side is slightly pulled down data’ 1 data 1 – Reading this cell d 1 0 a word enable 1 1 <1 To sense amplifiers – The electrical description of SRAM is really beyond our scope – just general idea here, mainly to contrast with DRAM... Digital Design Copyright © 2006 Frank Vahid 68 Dynamic RAM (DRAM) 32 10 data addr rw 1024x32 RAM en DRAM cell • “Dynamic” RAM cell – 1 transistor (rather than 6) – Relies on large capacitor to store bit • Write: Transistor conducts, data voltage level gets stored on top plate of capacitor • Read: Just look at value of d • Problem: Capacitor discharges over time – Must “refresh” regularly, by reading d and then writing it right back data cell word enable capacitor slowly discharging (a) data enable d Digital Design Copyright © 2006 Frank Vahid d discharges (b) 69 Comparing Memory Types • Register file – Fastest – But biggest size • SRAM – Fast – More compact than register file • DRAM MxN Memory implemented as a: register file SRAM DRAM – Slowest • And refreshing takes time – But very compact • Use register file for small items, SRAM for large items, and DRAM for huge items Size comparison for same number of bits (not to scale) – Note: DRAM’s big capacitor requires a special chip design process, so DRAM is often a separate chip Digital Design Copyright © 2006 Frank Vahid 70 Reading and Writing a RAM clk 2 1 addr 9 13 data 500 999 rw 3 9 Z addr 500 1 means write en • clk data rw valid setup time valid hold time setup time 500 access time RAM[9] RAM[13] now equals 500 now equals 999 Writing Z (b) – Put address on addr lines, data on data lines, set rw=1, en=1 • Reading – Set addr and en lines, but put nothing (Z) on data lines, set rw=0 – Data will appear on data lines • Don’t forget to obey setup and hold times – In short – keep inputs stable before and after a clock edge Digital Design Copyright © 2006 Frank Vahid 71 RAM Example: Digital Sound Recorder wire microphone en rw addr data 4096× 16 RAM 16 analog-todigital converter ad_buf ad_ld 12 Ra Rrw Ren processor digital-toanalog converter wire da_ld • Behavior speaker – Record: Digitize sound, store as series of 4096 12-bit digital values in RAM • We’ll use a 4096x16 RAM (12-bit wide RAM not common) – Play back later – Common behavior in telephone answering machine, toys, voice recorders • To record, processor should read a-to-d, store read values into successive RAM words – To play, processor should read successive RAM words and enable d-to-a Digital Design Copyright © 2006 Frank Vahid 72 RAM Example: Digital Sound Recorder 4096x16 RAM • RTL design of processor – Create high-level state machine – Begin with the record behavior – Keep local register a • Stores current address, ranges from 0 to 4095 (thus need 12 bits) – Create state machine that counts from 0 to 4095 using a • For each a – Read analog-to-digital conv. » ad_ld=1, ad_buf=1 – Write to RAM at address a » Ra=a, Rrw=1, Ren=1 Digital Design Copyright © 2006 Frank Vahid 16 analog-todigital converter ad_buf ad_ld 12 digital-toanalog converter Ra Rw Ren processor da_ld Record behavior Local register: a (12 bits) a<4095 S T a=0 ad_ld=1 ad_buf=1 Ra=a Rrw=1 Ren=1 a U a=a+1 a=4095 73 RAM Example: Digital Sound Recorder – Now create play behavior – Use local register a again, create state machine that counts from 0 to 4095 again • For each a – Read RAM – Write to digital-to-analog conv. • Note: Must write d-to-a one cycle after reading RAM, when the read data is available on the data bus – The record and play state machines would be parts of a larger state machine controlled by signals that determine when to record or play Digital Design Copyright © 2006 Frank Vahid 4096x16 RAM data bus 16 analog-todigital converter ad_buf ad_ld 12 digital-toanalog converter Ra Rw Ren processor da_ld Play behavior Local register: a (12 bits) a<4095 V W a=0 ad_buf=0 Ra=a Rrw=0 Ren=1 a X da_ld=1 a=a+1 a=4095 74 Read-Only Memory – ROM • Memory that can only be read from, not written to 32 data 10 addr – Data lines are output only – No need for rw input rw 1024× 32 RAM en • Advantages over RAM – Compact: May be smaller – Nonvolatile: Saves bits even if power supply is turned off – Speed: May be faster (especially than DRAM) – Low power: Doesn’t need power supply to save bits, so can extend battery life • Choose ROM over RAM if stored data won’t change (or won’t change often) RAM block symbol 32 data 10 addr 1024x32 ROM en ROM block symbol – For example, a table of Celsius to Fahrenheit conversions in a digital thermometer Digital Design Copyright © 2006 Frank Vahid 75 Read-Only Memory – ROM 32 data 10 addr 1024x32 ROM Let A = log2M en d0 ROM block symbol addr0 addr1 addr(A-1) addr a0 a1 AxM d1 decoder a(A-1) e clk word enable bit storage block (aka “cell”) word data word word enable enable data d(M-1) en rdata(N-1) rdata(N-2) rdata0 ROM cell • Internal logical structure similar to RAM, without the data input lines Digital Design Copyright © 2006 Frank Vahid 76 ROM Types • If a ROM can only be read, how are the stored bits stored in the first place? – Storing bits in a ROM known as programming – Several methods • Mask-programmed ROM – Bits are hardwired as 0s or 1s during chip manufacturing • 2-bit word on right stores “10” • word enable (from decoder) simply passes the hardwired value through transistor 1 data line cell 0 data line cell word enable – Notice how compact, and fast, this memory would be Digital Design Copyright © 2006 Frank Vahid 77 ROM Types • Fuse-Based Programmable ROM – Each cell has a fuse – A special device, known as a programmer, blows certain fuses (using higher-than-normal voltage) • Those cells will be read as 0s (involving some special electronics) • Cells with unblown fuses will be read as 1s • 2-bit word on right stores “10” – Also known as One-Time Programmable (OTP) ROM Digital Design Copyright © 2006 Frank Vahid 1 data line cell 1 data line cell a word enable fuse blown fuse 78 ROM Types • Erasable Programmable ROM (EPROM) • Electrons become trapped in the gate • Only done for cells that should store 0 • Other cells (without electrons trapped in gate) will be 1 – 2-bit word on right stores “10” • Details beyond our scope – just general idea is necessary here floating-gate transistor – Uses “floating-gate transistor” in each cell – Special programmer device uses higherthan-normal voltage to cause electrons to tunnel into the gate o tr word enable ting ar e ta t g data line data line cell cell 1 0 eÐeÐ trapped electrons – To erase, shine ultraviolet light onto chip • Gives trapped electrons energy to escape • Requires chip package to have window Digital Design Copyright © 2006 Frank Vahid 79 ROM Types • Electronically-Erasable Programmable ROM (EEPROM) – Similar to EPROM • Uses floating-gate transistor, electronic programming to trap electrons in certain cells – But erasing done electronically, not using UV light – Erasing done one word at a time • Flash memory – Like EEPROM, but all words (or large blocks of words) can be erased simultaneously – Become common relatively recently (late 1990s) • Both types are in-system programmable – Can be programmed with new stored bits while in the system in which the ROM operates • Requires bi-directional data lines, and write control input • Also need busy output to indicate that erasing is in progress – erasing takes some time Digital Design Copyright © 2006 Frank Vahid 32 data 10 addr en 1024x32 EEPROM write busy 80 ROM Example: Digital Telephone Answering Machine Using a Flash Memory • Want to record the outgoing announcement – When rec=1, record digitized sound in locations 0 to 4095 – When play=1, play those stored sounds to digital-toanalog converter • What type of memory? – Should store without power supply – ROM, not RAM – Should be in-system programmable – EEPROM or Flash, not EPROM, OTP ROM, or mask-programmed ROM – Will always erase entire memory when reprogramming – Flash better than EEPROM Digital Design Copyright © 2006 Frank Vahid 4096x16 Flash “We’re not home.” busy analog-todigital converter 16 ad_buf ad_ld 12 Ra Rrw Ren er processor record microphone bu digital-toanalog converter da_ld rec play speaker 81 ROM Example: Digital Telephone Answering Machine Using a Flash Memory • High-level state machine – Once rec=1, begin erasing flash by setting er=1 – Wait for flash to finish erasing by waiting for bu=0 – Execute loop that sets local register a from 0 to 4095, reading analog-todigital converter and writing to flash for each a 4096x16 Flash analog-todigital converter ad_buf 12 ad_ld Ra Rrw Ren er processor bu digital-toanalog converter da_ld rec record play microphone S a=0 er=1 rec Digital Design Copyright © 2006 Frank Vahid 16 speaker Local register: a (13 bits) bu a<4096 T bu’ U er=0 ad_ld=1 ad_buf=1 Ra=a Rrw=1 Ren=1 a=a+1 a V a=4096 82 Blurring of Distinction Between ROM and RAM • We said that – RAM is readable and writable – ROM is read-only ROM Flash EEPROM RAM a NVRAM • But some ROMs act almost like RAMs – EEPROM and Flash are in-system programmable • Essentially means that writes are slow – Also, number of writes may be limited (perhaps a few million times) • And, some RAMs act almost like ROMs – Non-volatile RAMs: Can save their data without the power supply • One type: Built-in battery, may work for up to 10 years • Another type: Includes ROM backup for RAM – controller writes RAM contents to ROM before turning off • New memory technologies evolving that merge RAM and ROM benefits – e.g., MRAM • Bottom line – Lot of choices available to designer, must find best fit with design goals Digital Design Copyright © 2006 Frank Vahid 83 Hierarchy and Abstraction • Abstraction – Hierarchy often involves not just grouping items into a new item, but also associating higher-level behavior with the new item, known as abstraction • e.g., an 8-bit adder has an understandable high-level behavior – it adds two 8-bit binary numbers – Frees designer from having to remember, or even from having to understand, the lower-level details Digital Design Copyright © 2006 Frank Vahid a7.. a0 b7.. b0 8-bit adder co ci s7.. s0 84 Hierarchy and Composing Larger Components from Smaller Versions • A common task is to compose smaller components into a larger one – Gates: Suppose you have plenty of 3-input AND gates, but need a 9-input AND gate a • Can simple compose the 9-input gate from several 3-input gates – Muxes: Suppose you have 4x1 and 2x1 muxes, but need an 8x1 mux • s2 selects either top or bottom 4x1 • s1s0 select particular 4x1 input • Implements 8x1 mux – 8 data inputs, 3 selects, one output P ro vin c e1 Digital Design Copyright © 2006 Frank Vahid 85 Hierarchy and Composing Larger Components from Smaller Versions • Composing memory very common • Making memory words wider – Easy – just place memories side-by-side until desired width obtained – Share address/control lines, concatenate data lines – Example: Compose 1024x8 ROMs into 1024x32 ROM 10 addr en addr 1024x8 ROM en data addr 1024x8 ROM en data addr 1024x8 ROM en data addr 1024x8 ROM en data 8 8 8 8 data(31..0) 10 1024x32 ROM data Digital Design Copyright © 2006 Frank Vahid 32 86 Hierarchy and Composing Larger Components from Smaller Versions • 11 a9..a0 Creating memory with more words – Put memories on top of one another until the number of desired words is achieved – Use decoder to select among the memories • Can use highest order address input(s) as decoder input • Although actually, any address line could be used addr a a10 just chooses which memory to access Digital Design Copyright © 2006 Frank Vahid 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1x2 d0 dcd i0 e d1 1024x8 ROM en data 8 en – Example: Compose 1024x8 memories into 2048x8 memory a10a9a8 a0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 a10 addr addr 1024x8 ROM 11 2048x8 ROM en data 8 data addr 8 1024x8 ROM en data a addr 1024x8 ROM en data To create memory with more words and wider words, can first compose to enough words, then widen. 87 Chapter Summary – Modern digital design involves creating processor-level components – Four-step RTL method can be used • 1. High-level state machine 2. Create datapath 3. Connect datapath to controller 4. Derive controller FSM – Several example • Control dominated, data dominated, and mix – Determining fastest clock frequency • By finding critical path – Behavioral-level design – C to gates • By using method to convert C (subset) to high-level state machine – Additional RTL components • Memory: RAM, ROM • Queues – Hierarchy: A key concept used throughout Chapters 2-5 Digital Design Copyright © 2006 Frank Vahid 88