ECE 368 A Tour by Example of Non-Trivial Circuit Design and VHDL Description Lecture Notes # 4 Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago Outline • Circuit Design Problem • Solution Approaches: – Truth Table (TT) vs. Computational/Algorithmic – Yes, hardware, just like software can implement any algorithm! – Flat vs. Divide-&-Conquer – Divide-&-Conquer: • Associative operations/functions • General operations/functions • Expressing the hardware soln. using programming language constructs incl. recursions and iterations • Circuit Synthesis – Translation of program-language description to a digital ckt. • Summary Circuit Design Problem • Design an 8-bit greater-than comparator that compares two 8-bit #s available in two registers A[7..0] and B[7..0] that o/ps: F = 1 if A > B and F = 0 if A <= B. • Approach 1: The TT approach -- Write down a 16-bit TT, derive logic expression from it, minimize it, obtain gate-based realization, etc.! A 00000000 B 00000000 F 0 00000000 00000001 0 -------------------00000001 00000000 1 ---------------------11111111 11111111 0 – – – – Too cumbersome and time-consuming Fraught with possibility of human error Difficult to formally prove correctness (i.e., proof w/o exhasutive testing) Will generally have high hardware cost and delay Circuit Design Problem (contd) • Approach 2: Think computationally/algorithmically about what the ckt is supposed to compute: • Approach 2(a): Flat algorithmic approach: – Note: A TT can be expressed as a sequence of “if-then-else’s” – If A = 00000000 and B = 00000000 then F = 0 else if A = 00000000 and B = 00000001 then F=0 ………. else if A = 00000001 and B = 00000000 then F=1 ………. – Essentially a re-hashing of the TT – same problems as the TT approach – Need to think computationally & structurally (i.e., based on the structure of the program at hand) at a higher level! Circuit Design Problem (contd) • Approach 2(b): Structural algorithmic approach: – Be more innovative, think of the structure/properties of the computational problem – E.g., think if the problem can be solved in a hierarchical or divide&-conquer (D&C) manner: Stitch-up of solns to A1 and A2 to form the complete soln to A Root problem A Subprob. A1 A1,1 A1,2 Subprob. A2 A2,1 A2,2 Do recursively until subprob-size is s.t. TT-based design is doable – D&C approach: See if the problem can be “broken up” into 2 or more smaller subproblems that can be “stitched-up” to give a soln. to the parent prob. – Do this recrusively for each large subprob until subprobs are small enough for TT-based solution – If the subprobs are of a similar kind (but of smaller size) to the root prob then the breakup and stitching will also be similar Shift Gears: Design of a Parity Detection Circuit—A Series of XORs (b) 16-bit parity tree (a) A linearly-connected circuit f = (((x(15) xor x(14)) xor (x(13) xor x(12))) xor ((x(11) xor x(10)) xor (x(9) xor x(8)))) xor (((x(7) xor x(6)) xor (x(5) xor x(4))) xor ((x(3) xor x(2)) xor (x(1) xor x(0)))) x(1) x(0) x(15) x(14) x(0) x(1) x(2) X(3) x(15) f • No concurrency in design (a)---the actual problem has w(3,5) w(3,7) available concurrency, though, and it is not exploited well in w(3,4) w(3,6) the above “linear” design • Complete sequentialization leading to a delay that is linear in the # of bits n (delay = (n-1)*td), td = delay of 1 gate • All the available concurrency is exploited in design (b)---a w(2,3) w(2,2) parity tree. • Question: When can we have a tree-structured circuit for a chain of the same operation on multiple operands? • Answer: (1) First of all when the operation makes sense for any # of operands. (2) It should be possible to break it w(1,1) down into smaller-size operations. (3) Finally, when the operation is associative. An operation “x” is said to be associative if: a x b x c = (a x b) x c = a x (b x c). • Thus if we have 4 operations a x b x c x d, we can either perform this as a x (b x (c x d)) [getting a linear delay of 3 units] or as (a x b) x (c x d) [getting a logarithmic (base 2) delay of 2 units and exploiting the available concurrency due w(0,0) = f to the fact that “x” is associative]. • We can extend this idea to n operands (& n-1 operations) to perform as many of the pairwise operations as possible in parallel (& do this recursively for every level of remaining operations), similar to design (b) for the parity detector [xor is an associative operation!] and thus get a (log2 n) delay. w(3,1) w(3,3) w(3,2) w(2,1) w(3,0) w(2,0) w(1,0) Delay = (# of levels in AND-OR tree) * td = log2 (n) *td An example of simple designer ingenuity---a bad design would have resulted in a linear delay that the VHDL code & the synthesis tool would have been at the mercy of. D&C for Associative Operations • Let f(xn-1, ….., x0) be an associative function. • What is the D&C principle involved in the design of an n-bit xor/parity function? Can it also lead automatically to a tree-based ckt? f(xn-1, .., x0) Stitch-up function---same as the original function for 2 inputs f(a,b) a f(xn-1, .., xn/2) b f(xn/2-1, .., x0) • Using the D&C approach for an associative operation results in the stitch up function being the same as the original function (not the case for nonassoc. operations), but w/ a constant # of operands (2, if the orig problem is broken into 2 subproblems) • If the two sub-problems of the D&C approach are balanced (of the same size or as close to it as possible), then unfolding the D&C results in a balanced operation tree of the type for the xor/parity function seen earlier Using Generate Statements for Describing a Tree-Structured Circuit 16-bit parity tree x(15) x(14) x(1) x(0) w(3,4) w(3,6) w(2,3) w(3,1) w(3,3) w(3,5) w(3,7) w(2,2) w(3,2) w(2,1) w(3,0) w(2,0) entity parity_tree is – a (2**k)-bit parity tree generic (k : natural, gate_delay : time := 2 ns); -- n = 2**k is the # of inputs port (x : in std_logic_vector ( 2**k - 1 downto 0); f : out std_logic); end entity parity_tree; architecture struct of parity_tree is type matrix is array (k-1 downto 0, 2**k - 1 downto 0) of std_logic; signal wire : matrix; begin outer_loop: for j in k-1 downto 0 generate inner_loop: for i in 0 to 2**j - 1 generate first_level: if j=k-1 then generate xor_gates_level1: entity work.xor_2(behav) – direct instantiation w(1,1) An example of simple designer ingenuity---a bad design would have resulted in a linear delay that the VHDL code & the synthesis tool would have been at the mercy of. w(0,0) = f entity xor_2 is generic (td: time := 2 ns); -propagation delay of gate port (a, b : in std_logic; c: out std_logic); end entity xor_2; w(1,0) Delay = (# of levels in AND-OR tree) * td = log2 (n) *td Note: (a) w(I,j) = wire(I,j) (b) Signals of 2-d array ``wire’’ not shown are unused, and will not be synthesized architecture behav of xor_2 is begin c <= a xor b after td; end end architecture behav; generic map (gate_delay); -- pass gate delay to xor port map (x(2*i), x(2*i+1), wire(j,i)); end generate; lower_levels: if j < k-1 then generate xor_gates_lower: entity work.xor_2(behav) generic map (gate_delay); port map (wire(j+1, 2*i), wire(j+1, 2*i + 1), wire(j,i)); end generate; -- if generate end generate; -- inner generate for loop end generate; -- outer generate for loop f <= wire(0,0); end architecture struct; Comparator Circuit Design Using D&C • Useful property: At any level, comp. of MS (most significant) half determines o/p if result is > or < else comp. of LS ½ determ. o/p • Can thus break up problem at any level into MS ½ and LS ½ comparisons & based on their results determine which o/p to choose for the higher-level (parent) result • Is this is associative?—not sure A • For a non-associative func, determine its property(ies) that allows determining a correct Comp. A[7..0]],B[7..0] stitch-up function (requires ingenuity, solid thinking) A1 A1,1 Comp A[7..6],B[7..6] A1,1,1 Comp A[7],B[7] If A1,1,1 res. is > or < take A1,1,1 res. else take A1,1,2 res. Comp A[7..4],B[7..4] If A1,1 res. is > or < take A1,1 res. else take A1,2 res. If A1 result is > or < take A1 result else take A2 result Stitch-up of solns to A1 and A2 to form the complete soln to A Comp A[3..0],B[3..0] A2 A1,2 Comp A[5,4],B[5,4] A1,1,2 Comp A[6],B[6] The TT may be derived directly or by first thinking of and expressing its computation in a high-level programming language and then converting it to a TT. Small enough to be designed using a TT If A[i] = B[i] then { f1(i)=0; f2(i) = 1; /* f2(i) o/p is an i/p to the stitch logic */ A[i] B[i] 0 0 0 1 1 0 1 1 f1(i) f2(i) 0 1 0 0 1 0 0 1 (2-bit 2-o/p comparator) /* f2(i) =1 means f1( ), f2( ) o/ps of the LS ½ of this subtree should be selected by the stitch logic as its o/ps */ else if A[i] < B[i} then { f1(i) = 0; /* indicates < */ f2(i) = 0 } /* indicates f1(i), f2(i) o/ps should be selected by stitch logic as its o/ps */ else if A[i] > B[i] then {f1(i) = 1; /* indicates > */ f2(i) = 0 } Comparator Circuit Design Using D&C (contd.) • Once the D&C tree is formulated it is easy to get the low-level & stitch-up designs • Stitch-up design shown here A Comp. A[7..0]],B[7..0] A1 If A1 result is > or < take A1 reslt else take A2 result Comp A[7..4],B[7..4] A1,1 Comp A[7..6],B[7..6] A1,1,1 Comp A[7],B[7] If A1,1,1 res. is > or < take A1,1,1 res. else take A1,1,2 res. If A1,1 res. is > or < take A1,1 res. else take A1,2 res. f1(i) f2(i) 0 1 0 0 1 0 0 1 A2 Comp A[3..0],B[3..0] A1,2 Comp A[5,4],B[5,4] A1,1,2 Stitch up logic details: If f2(i) = 0 then { my_op1=f1(i); my_op2=f2(i) } /* select MS ½ comp o/ps */ else /* select LS ½ comp. o/ps */ {my_op1=f1(i-1); my_op2=f2(i-1) } Comp A[6],B[6] my_op1 my_op2 A[i] B[i] 0 0 0 1 1 0 1 1 Stitch-up of solns to A1 and A2 to form the complete soln to A my_op Stitch-up logic 2-bit 2:1 Mux f2(i) I0 2 f1(i) f2(i) f1(i-1) f2(i-1) OR 2 I1 2 f(i) f1(i) f2(i) f1(i-1) f2(i-1) my_op1 my_op2 X 0 X X f1(i) f2(i) X 1 X X f1(i-1) f2(i-1) f(i-1) (Direct design) (Compact TT) Comparator Circuit Design Using D&C – Final Design • H/W_cost(8-bit comp.) = 7(HW_cost(2:1 Muxes)) + 8(H/W_cost(2-bit comp.) F= my1(6) 1-bit • H/W_cost(n-bit comp.) = my(5)(2) 2:1 Mux (n-1)(H/W_cost(2:1 Muxes)) + I1 I0 n(H/W_cost(2-bit comp.)) my(5) Log n level of Muxes I0 I0 I0 2 my(2) my(1) 2 I0 2 2 f(6) 2 I1 2 I0 2 2 f(5) I1 2 my(0) 2 f(4) 2 I1 2 I0 2 2 f(3) 2 2-bit f(1)(2) 2:1 Mux 2-bit f(3)(2) 2:1 Mux 2-bit f(5)(2) 2:1 Mux I1 2 2-bit my(1)(2) 2:1 Mux 2 2 f(7) 2 my(4) I1 2-bit f2(7) = f(7)(2) 2:1 Mux 2 my(4)(1) my(5)(1) 2 2-bit my(3)(2) 2:1 Mux 2 my(3) • Delay(8-bit comp.) = 3 (delay of 2:1 Mux) + delay of 2-bit comp. • Note parallelism at work – multiple logic blocks are processing simult. • Delay(n-bit comp.) = log n (delay of 2:1 Mux) + delay of 2-bit comp. f(2) 2 I1 2 f(1) 2 f(0) 2 1-bit comparator 1-bit comparator 1-bit comparator 1-bit comparator 1-bit comparator 1-bit comparator 1-bit comparator 1-bit comparator A[7] B[7] A[6] B[6] A[5] B[5] A[4] B[4] A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0] Comparator Circuit Design Using D&C – Behavioral Description using a High-Level Language • Recursive Description: Procedure Compare(A[m, k], B[m, k]) Begin if m-k>=1 then { f[2..1] = Compare(A[m, m-(m-k)/2], B[m, m-(m-k)/2]); If f[2] = 0 then return(f[2..1]) /* result has been determined based on MS comp. */ else { return(Compare(A[m-1-(m-k)/2, k], B[m-1-(m-k)/2, k]);} else /* m-k=0 – single-bit comparison problem */ { if A[m] > B[m] then return(1,0) else if A[m] < B[m] then return(0,0) else return(0,1) } End • Main program: Compare(A[7..0], B[7..0]); • Problem: The design has been sequentialized – perform MS comparison look at the results if needed, perform LS comparison, instead of MS and LS comparisons being performed simultaneously. •Thus no parallelism! Delay is linear in n as opposed to log n w/ parallelism • Limitation of regular programming languages in specifying parallelism • Need a Hardware Description Language (HDL) for specifying parallelism. VHDL & Verilog are such languages start MS 4-bit comparison o/p if MS comp. indicates = then start LS comparison I0 I1 2:1 Mux o/p enable LS 4-bit comparison o/p Comparator Circuit Design Using D&C – Behavioral Description using a High-Level Language • Iterative Description – Flattening the recursion: Procedure Compare(A[n-1, 0], B[mn-1, 0]) Begin for i = n-1 downto 0 do { if A[i] > B[i] then return(1) else if A[i] < B[i] then return(0) else if i=0 and A[0] = B[0] then return(0) } End • Main program: Compare(A[7..0], B[7..0]); • Same problem of sequentialization – higher-order bit compared before next lowerorder bit and so on, leading to a linear delay in # of bits (as opposed to log n with parallelism) F Logic for selecting one of the comparator o/ps corresponding to the 1 st comparator from the left that has st=0 f(7) f(6) f(5) f(4) f(3) f(2) f(1) f(0) st en st en st en st en st en st en 1-bit 1-bit st en 1-bit 1-bit 1-bit 1-bit 1-bit 1-bit comparator comparator comparator comparator comparator comparator comparator comparator A[7] B[7] A[6] B[6] A[5] B[5] A[4] B[4] A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0] Concurrent Statements & Component Instantiations in VHDL • Parallelism or concurrency needs to be explicitly specified by an HDL—a synthesis tool will mostly not be able to extract any parallelism from a description (i.e.,coding) that does not explicitly expose the parallelism • VHDL specifies concurrency using concurrent statements • VHDL specifies iterative and recursive specifications of concurrency using iterative generate statements and conditional generate statements a b A C c d B • In the simple ckt to the left OR gates A and B are supposed to operate in parallel/concurrently x z • No way to specify this in a s/w prog. lang (or in a VHDL behavioral dsecription) y VHDL Description: entity simple_ckt is port(a, b, c, d: in std_logic; z”: out std_logic); -- like procedure input/output variable declarations end entity simple_ckt; -- above are input/output ports or wires architecture data_flow of simple_ckt is signal x, y : std_logic; -- declaring internal wires begin x <= a or b; -- in-built OR function -- may also have “x <= a or b after 2 ns;” for delay -- and similarly for the other assign. statements y <= c or d; -- order of these statements do z <= x or y; -- not matter; all operate concurrently end architecure; OR architecture structural of simple_ckt is signal x, y : std_logic; begin or_A: entity work.or_gate(behav) -- predefined OR gate port map (a, b, x); -- this is “direct instantiation” or_B: entity work.or_gate(behav) port map(c, d, y); -- order of instant’n doesn’t matter Or_C: entity work.or_gate(behav) port map (x, y, z); end architecure; D&C-based Comparator Design Description using VHDL VHDL Description – Generate, Recursion, Concurrency A Comp. A[7..0]],B[7..0] entity tree_comparator is A1 generic (n: natural) – parameterizes the design size port(A, B: in std_logic_vector(n-1 downto 0); f: out Comp A[7..4],B[7..4] std_logic_vector(0 to 1)); end entity tree_comparator; architecture struct_recursive of tree_comparator is signal f1, f2 : std_logic_vector(0 to 1); begin simpl_comp: if n = 1 generate begin Leaf_comp: entity work.one_bit_comp(behav) port map (A(n-1), B(n-1), f); end generate simpl_comp; compound_comp: if n > 1 generate begin comp1: entity work.tree_comparator(recursive) generic map (n/2) port map (A(n-1 downto n/2), B(n-1 downto n/2), f1); comp2: entity work.tree_comparator(recursive) generic map (n/2) port map (A(n/2 - 1 downto 0), B(n/2 -1 downto 0), f2); mux_2bit: entity mux_two_to_one(behav) generic map (2) -- # of bits port map (f1, f2, f1(1), f); – f1 & f2 are 2-bit data i/ps, -- f1(1) is the 1-bit select, f is the 2-bit output end generate compound_comp; end architecure struct_recursive; A2 If A1 reslt is > or < take A1 reslt else take A2 reslt Comp A[3..0],B[3..0] 2 f 1-bit comp. a b f out sel f1(1) 2 2-bit 2:1 Mux I0 2 I1 2 f1 f2 f f (n/2)-bit comp. (n/2)-bit comp. A A A(n-1..n/2) B B(n-1..n/2) B A(n/2 -1..0) B(n/2 -1..0) D&C-based Comparator Design Description using VHDL (contd.) Component Descriptions: f entity one_bit_comp is port (a, b: in std_logic; f: out std_logic_vector(0 to 1)); end entity one_bit_comp; 2 1-bit comp. a f out sel I0 2 b architecture behav of one_bit_comp is foo: process is -- may have “wait for 10 ns” to specify delay for -- entire process (and in this case architecture) if a > b then f(0) <= `1’; f(1) <= `0’; elsif a < b then f(0) <= `0’; f(1) <= `0’; else f(0) <= `0’; f(1) <= `1’; endif; end process foo; end architecture; entity mux_two_to_one is generic (width: natural) port (I0, I1: in std_logic_vector(0 to width-1); sel: in std_logic; f: out std_logic_vector(0 to width-1)); end entity mux_two_to_one; 2 2-bit 2:1 Mux I1 2 architecture behav of mux_two_to_one is foo: process is -- may have “wait for 5 ns” to specify delay for process if sel = `0’ then f(0 to width-1) <= I0(0 to width-1); elsif sel = `1’ then f(0 to width-1) <= I1(0 to width-1); endif; end process foo; end architecture behav; Summary • For complex digital design, we need to think of the “computation” underlying the design in an algorithmic and high-level manner: – is it amenable to the D&C approach (i.e., can be broken into smaller-sized problems whose outputs can be stitched-up)? – are there properties of this computation that can be exploited for faster, less expensive, modular design • • • • • • The design is then developed in a D&C manner & the corresponding circuit may be synthesized by describing it compactly using a structural HDL form For an operation/func x on n operands (an-1 x an-2 x …… x a0 ) if x is associative, the D&C approach gives an “easy” stitch-up function, which is x on 2 operands (o/ps of applying x on each half). This results in a treestructured circuit with (log n) delay instead of a linearly-connected circuit with (n) delay can be synthesized. If x is non-associative, more ingenuity and determination of properties of x is needed to determine the break-up of the function and the stitch-up function. The resulting design may or may not be tree-structured A hardware description language with a structural form is useful to describe large circuits with all the designed parallelism, and then have them synthesized automatically. VHDL provides special hardware-oriented constructs for the description of hardware that is not available in regular sequential s/w programming languages: especially, concurrency (via data flow or instantiation statements) and circuit-delay specifications. VHDL also has constructs that ease the description of regular-patterned circuits (linear arrays, multi-dimensional arrays, regular trees, etc.) of arbitrary size: generate statements and recursion. Copyright Notice for RASSP Slides (material included w/ explicit acknowledgement in next few slides) Structural VHDL allows the designer to represent a system in terms of components and their interconnections. This module discusses the constructs available in VHDL to facilitate structural descriptions of designs. Generate Statement --- from RASSP slides • Structural descriptions of large, but highly regular, structures can be tedious. A VHDL GENERATE statement can be used to include as many concurrent VHDL statements (e.g. component instantiation statements) as needed to describe a regular structure easily. • In fact, a GENERATE statement may even include other GENERATE statements for more complex devices.. Some common examples include the instantiation and connection of multiple identical components such as half adders to make up a full adder, or exclusive or gates to create a parity tree. Generate Statement – For Scheme --- from RASSP slides • VHDL provides two different schemes of the GENERATE statement, the FOR-scheme and the IF-scheme. • This slide shows the syntax for the FOR-scheme. • The FOR-scheme is reminiscent of a FOR loop used for sequence control in many programming languages. • The FOR-scheme generates the included concurrent statements the assigned number of times. In the FOR-scheme, all of the generated concurrent statements must be the same. • The loop variable is created in the GENERATE statement and is undefined outside that statement (i.e. it is not a variable or signal visible elsewhere in the architecture). •.The loop variable in this FOR-scheme case is N. The range can be any valid discrete range. After the GENERATE keyword, the concurrent statements to be generated are stated, and the GENERATE statement is closed with END GENERATE. Generate Statement – For Scheme Example --- from RASSP slides • This slide shows an example of the FOR-scheme. • The code generates an array of AND gates. In this case, the GENERATE statement has been named G1 and instantiates an array of 8 and_gate components. • The PORT MAP statement maps the interfaces of each of the 8 gates to specific elements of the S1, S2, and S3 vectors by using the FOR loop variable as an index. Generate Statement – If Scheme --- from RASSP slides • The second form of the GENERATE statement is the IF-scheme. This scheme allows for conditional generation of concurrent statements. • One obvious difference between this scheme and the FOR-scheme is that all the concurrent statements generated do not have to be the same. • While this IF statement may seem reminiscent to the IF-THEN-ELSE constructs in programming languages, note that the GENERATE IF-scheme does not provide ELSE or ELSIF clauses. • The Boolean expression of the IF statement can be any valid Boolean expression. Generate Statement – If Scheme Example --- from RASSP slides • The example here uses the IF-scheme GENERATE statement to make a modification to the and_gate array such that the seventh gate of the array will be an or_gate. • Another example use of the IF-scheme GENERATE is in the conditional execution of timing checks. Timing checks can be incorporated inside a GENERATE IF-scheme. E.g., the foll. statement can be used: Check_time : IF TimingChecksOn GENERATE This allows the boolean variable TimingChecksOn to enable timing checks by generating the appropriate concurrent VHDL statements in the description. This parameter can be set in a package or passed as a generic and can improve simulation speed by shutting off this computational section.