ECE 368 A Tour by Example of Non-Trivial Circuit

advertisement
ECE 368
A Tour by Example of Non-Trivial Circuit
Design and VHDL Description
Lecture Notes # 4
Shantanu Dutt
Electrical & Computer Eng.
University of Illinois at Chicago
Outline
• Circuit Design Problem
• Solution Approaches:
– Truth Table (TT) vs. Computational/Algorithmic –
Yes, hardware, just like software can implement any
algorithm!
– Flat vs. Divide-&-Conquer
– Divide-&-Conquer:
• Associative operations/functions
• General operations/functions
• Expressing the hardware soln. using programming
language constructs incl. recursions and iterations
• Circuit Synthesis – Translation of program-language
description to a digital ckt.
• Summary
Circuit Design Problem
• Design an 8-bit greater-than comparator that compares two 8-bit #s
available in two registers A[7..0] and B[7..0] that o/ps:
F = 1 if A > B and F = 0 if A <= B.
• Approach 1: The TT approach -- Write down a 16-bit TT, derive logic
expression from it, minimize it, obtain gate-based realization, etc.!
A
00000000
B
00000000
F
0
00000000
00000001 0
-------------------00000001
00000000 1
---------------------11111111
11111111 0
–
–
–
–
Too cumbersome and time-consuming
Fraught with possibility of human error
Difficult to formally prove correctness (i.e., proof w/o exhasutive testing)
Will generally have high hardware cost and delay
Circuit Design Problem (contd)
• Approach 2: Think computationally/algorithmically about
what the ckt is supposed to compute:
• Approach 2(a): Flat algorithmic approach:
– Note: A TT can be expressed as a sequence of “if-then-else’s”
– If A = 00000000 and B = 00000000 then F = 0
else if A = 00000000 and B = 00000001 then F=0
……….
else if A = 00000001 and B = 00000000 then F=1
……….
– Essentially a re-hashing of the TT – same problems as the TT
approach
– Need to think computationally & structurally (i.e., based on the
structure of the program at hand) at a higher level!
Circuit Design Problem (contd)
• Approach 2(b): Structural algorithmic approach:
– Be more innovative, think of the structure/properties of the
computational problem
– E.g., think if the problem can be solved in a hierarchical or divide&-conquer (D&C) manner:
Stitch-up of solns to A1 and A2
to form the complete soln to A
Root problem A
Subprob. A1
A1,1
A1,2
Subprob. A2
A2,1
A2,2
Do recursively until subprob-size
is s.t. TT-based design is doable
– D&C approach: See if the problem can be “broken up” into 2 or more smaller
subproblems that can be “stitched-up” to give a soln. to the parent prob.
– Do this recrusively for each large subprob until subprobs are small enough for
TT-based solution
– If the subprobs are of a similar kind (but of smaller size) to the root prob then
the breakup and stitching will also be similar
Shift Gears: Design of a Parity Detection Circuit—A Series of XORs
(b) 16-bit parity tree
(a) A linearly-connected circuit
f = (((x(15) xor x(14)) xor (x(13) xor x(12))) xor ((x(11) xor x(10)) xor (x(9) xor x(8))))
xor (((x(7) xor x(6)) xor (x(5) xor x(4))) xor ((x(3) xor x(2)) xor (x(1) xor x(0))))
x(1) x(0)
x(15) x(14)
x(0)
x(1)
x(2)
X(3)
x(15)
f
• No concurrency in design (a)---the actual problem has
w(3,5)
w(3,7)
available concurrency, though, and it is not exploited well in
w(3,4)
w(3,6)
the above “linear” design
• Complete sequentialization leading to a delay that is linear
in the # of bits n (delay = (n-1)*td), td = delay of 1 gate
• All the available concurrency is exploited in design (b)---a
w(2,3)
w(2,2)
parity tree.
• Question: When can we have a tree-structured circuit for
a chain of the same operation on multiple operands?
• Answer: (1) First of all when the operation makes sense
for any # of operands. (2) It should be possible to break it
w(1,1)
down into smaller-size operations. (3) Finally, when the
operation is associative. An operation “x” is said to be
associative if: a x b x c = (a x b) x c = a x (b x c).
• Thus if we have 4 operations a x b x c x d, we can either
perform this as a x (b x (c x d)) [getting a linear delay of 3
units] or as (a x b) x (c x d) [getting a logarithmic (base 2)
delay of 2 units and exploiting the available concurrency due
w(0,0) = f
to the fact that “x” is associative].
• We can extend this idea to n operands (& n-1 operations) to perform as many of
the pairwise operations as possible in parallel (& do this recursively for every level
of remaining operations), similar to design (b) for the parity detector [xor is an
associative operation!] and thus get a (log2 n) delay.
w(3,1)
w(3,3)
w(3,2)
w(2,1)
w(3,0)
w(2,0)
w(1,0)
Delay = (# of levels in
AND-OR tree) * td =
log2 (n) *td
An example of simple
designer ingenuity---a
bad design would
have resulted in a
linear delay that the
VHDL code & the
synthesis tool would
have been at the
mercy of.
D&C for Associative Operations
• Let f(xn-1, ….., x0) be an associative function.
• What is the D&C principle involved in the design of an n-bit xor/parity
function? Can it also lead automatically to a tree-based ckt?
f(xn-1, .., x0)
Stitch-up function---same as the
original function for 2 inputs
f(a,b)
a
f(xn-1, .., xn/2)
b
f(xn/2-1, .., x0)
• Using the D&C approach for an associative operation results in the stitch
up function being the same as the original function (not the case for nonassoc. operations), but w/ a constant # of operands (2, if the orig problem
is broken into 2 subproblems)
• If the two sub-problems of the D&C approach are balanced (of the same
size or as close to it as possible), then unfolding the D&C results in a
balanced operation tree of the type for the xor/parity function seen earlier
Using Generate Statements for Describing a Tree-Structured Circuit
16-bit parity tree
x(15) x(14)
x(1) x(0)
w(3,4)
w(3,6)
w(2,3)
w(3,1)
w(3,3)
w(3,5)
w(3,7)
w(2,2)
w(3,2)
w(2,1)
w(3,0)
w(2,0)
entity parity_tree is – a (2**k)-bit parity tree
generic (k : natural, gate_delay : time := 2 ns);
-- n = 2**k is the # of inputs
port (x : in std_logic_vector ( 2**k - 1 downto 0);
f : out std_logic);
end entity parity_tree;
architecture struct of parity_tree is
type matrix is array (k-1 downto 0, 2**k - 1 downto
0) of std_logic;
signal wire : matrix;
begin
outer_loop: for j in k-1 downto 0 generate
inner_loop: for i in 0 to 2**j - 1 generate
first_level: if j=k-1 then generate
xor_gates_level1: entity work.xor_2(behav)
– direct instantiation
w(1,1)
An example of simple
designer ingenuity---a
bad design would
have resulted in a
linear delay that the
VHDL code & the
synthesis tool would
have been at the
mercy of.
w(0,0) = f
entity xor_2 is
generic (td: time := 2 ns); -propagation delay of gate
port (a, b : in std_logic; c:
out std_logic);
end entity xor_2;
w(1,0)
Delay = (# of levels in
AND-OR tree) * td =
log2 (n) *td
Note: (a) w(I,j) = wire(I,j)
(b) Signals of 2-d array
``wire’’ not shown are
unused, and will not be
synthesized
architecture behav of xor_2 is
begin
c <= a xor b after td;
end
end architecture behav;
generic map (gate_delay); -- pass gate delay to xor
port map (x(2*i), x(2*i+1), wire(j,i));
end generate;
lower_levels: if j < k-1 then generate
xor_gates_lower: entity work.xor_2(behav)
generic map (gate_delay);
port map (wire(j+1, 2*i), wire(j+1, 2*i + 1), wire(j,i));
end generate; -- if generate
end generate; -- inner generate for loop
end generate; -- outer generate for loop
f <= wire(0,0);
end architecture struct;
Comparator Circuit Design Using D&C
• Useful property: At any
level, comp. of MS (most
significant) half determines
o/p if result is > or < else
comp. of LS ½ determ. o/p
• Can thus break up problem
at any level into MS ½ and
LS ½ comparisons & based
on their results determine
which o/p to choose for the
higher-level (parent) result
• Is this is associative?—not sure
A
• For a non-associative func,
determine its property(ies) that
allows determining a correct
Comp. A[7..0]],B[7..0]
stitch-up function (requires
ingenuity, solid thinking)
A1
A1,1
Comp A[7..6],B[7..6]
A1,1,1
Comp A[7],B[7]
If A1,1,1 res. is
> or < take
A1,1,1 res. else
take A1,1,2 res.
Comp A[7..4],B[7..4]
If A1,1 res. is
> or < take
A1,1 res. else
take A1,2 res.
If A1 result is
> or < take
A1 result else
take A2 result
Stitch-up of solns to
A1 and A2 to form the
complete soln to A
Comp A[3..0],B[3..0]
A2
A1,2
Comp A[5,4],B[5,4]
A1,1,2
Comp A[6],B[6]
The TT may be derived directly or by first thinking of and expressing its
computation in a high-level programming language and then converting
it to a TT.
Small enough to be
designed using a TT
If A[i] = B[i] then { f1(i)=0; f2(i) = 1; /* f2(i) o/p is an i/p to the stitch logic */
A[i] B[i]
0 0
0 1
1 0
1 1
f1(i) f2(i)
0 1
0 0
1 0
0 1
(2-bit 2-o/p comparator)
/* f2(i) =1 means f1( ), f2( ) o/ps of the LS ½ of this subtree
should be selected by the stitch logic as its o/ps */
else if A[i] < B[i} then { f1(i) = 0; /* indicates < */
f2(i) = 0 } /* indicates f1(i), f2(i) o/ps should be selected by stitch logic as its o/ps */
else if A[i] > B[i] then {f1(i) = 1; /* indicates > */
f2(i) = 0 }
Comparator Circuit Design Using D&C (contd.)
• Once the D&C tree is formulated
it is easy to get the low-level &
stitch-up designs
• Stitch-up design shown here
A
Comp. A[7..0]],B[7..0]
A1
If A1 result is
> or < take
A1 reslt else
take A2 result
Comp A[7..4],B[7..4]
A1,1
Comp A[7..6],B[7..6]
A1,1,1
Comp A[7],B[7]
If A1,1,1 res. is
> or < take
A1,1,1 res. else
take A1,1,2 res.
If A1,1 res. is
> or < take
A1,1 res. else
take A1,2 res.
f1(i) f2(i)
0 1
0 0
1 0
0 1
A2
Comp A[3..0],B[3..0]
A1,2
Comp A[5,4],B[5,4]
A1,1,2
Stitch up logic details:
If f2(i) = 0 then { my_op1=f1(i);
my_op2=f2(i) } /* select MS ½ comp o/ps */
else /* select LS ½ comp. o/ps */
{my_op1=f1(i-1); my_op2=f2(i-1) }
Comp A[6],B[6]
my_op1 my_op2
A[i] B[i]
0 0
0 1
1 0
1 1
Stitch-up of solns to A1 and A2
to form the complete soln to A
my_op
Stitch-up
logic
2-bit
2:1 Mux
f2(i)
I0
2
f1(i) f2(i) f1(i-1) f2(i-1)
OR
2
I1
2
f(i)
f1(i) f2(i) f1(i-1) f2(i-1) my_op1 my_op2
X 0
X
X
f1(i)
f2(i)
X 1
X
X
f1(i-1) f2(i-1)
f(i-1)
(Direct design)
(Compact TT)
Comparator Circuit Design Using D&C – Final Design
• H/W_cost(8-bit comp.) =
7(HW_cost(2:1 Muxes)) +
8(H/W_cost(2-bit comp.)
F= my1(6)
1-bit
• H/W_cost(n-bit comp.) = my(5)(2) 2:1 Mux
(n-1)(H/W_cost(2:1 Muxes)) +
I1
I0
n(H/W_cost(2-bit comp.))
my(5)
Log n level
of Muxes
I0
I0
I0
2
my(2)
my(1)
2
I0
2
2
f(6)
2
I1
2
I0
2
2
f(5)
I1
2
my(0)
2
f(4)
2
I1
2
I0
2
2
f(3)
2
2-bit
f(1)(2) 2:1 Mux
2-bit
f(3)(2) 2:1 Mux
2-bit
f(5)(2) 2:1 Mux
I1
2
2-bit
my(1)(2) 2:1 Mux
2
2
f(7)
2
my(4)
I1
2-bit
f2(7) = f(7)(2) 2:1 Mux
2
my(4)(1)
my(5)(1)
2
2-bit
my(3)(2) 2:1 Mux
2
my(3)
• Delay(8-bit comp.) = 3 (delay of 2:1
Mux) + delay of 2-bit comp.
• Note parallelism at work – multiple
logic blocks are processing simult.
• Delay(n-bit comp.) = log n (delay of
2:1 Mux) + delay of 2-bit comp.
f(2)
2
I1
2
f(1)
2
f(0)
2
1-bit
comparator
1-bit
comparator
1-bit
comparator
1-bit
comparator
1-bit
comparator
1-bit
comparator
1-bit
comparator
1-bit
comparator
A[7] B[7]
A[6] B[6]
A[5] B[5]
A[4] B[4]
A[3] B[3]
A[2] B[2]
A[1] B[1]
A[0] B[0]
Comparator Circuit Design Using D&C – Behavioral
Description using a High-Level Language
• Recursive Description:
Procedure Compare(A[m, k], B[m, k])
Begin
if m-k>=1 then { f[2..1] = Compare(A[m, m-(m-k)/2], B[m, m-(m-k)/2]);
If f[2] = 0 then return(f[2..1]) /* result has been determined based on MS comp. */
else { return(Compare(A[m-1-(m-k)/2, k], B[m-1-(m-k)/2, k]);}
else /* m-k=0 – single-bit comparison problem */
{ if A[m] > B[m] then return(1,0) else if A[m] < B[m] then return(0,0) else return(0,1) }
End
• Main program: Compare(A[7..0], B[7..0]);
• Problem: The design has been sequentialized – perform MS comparison  look at
the results  if needed, perform LS comparison, instead of MS and LS comparisons
being performed simultaneously.
•Thus no parallelism! Delay is linear in n as opposed to log n w/ parallelism
• Limitation of regular programming languages in specifying parallelism
• Need a Hardware Description Language (HDL) for specifying parallelism. VHDL &
Verilog are such languages
start
MS 4-bit comparison
o/p
if MS comp. indicates =
then start LS comparison
I0
I1
2:1 Mux
o/p
enable
LS 4-bit comparison
o/p
Comparator Circuit Design Using D&C – Behavioral
Description using a High-Level Language
• Iterative Description – Flattening the recursion:
Procedure Compare(A[n-1, 0], B[mn-1, 0])
Begin
for i = n-1 downto 0 do
{
if A[i] > B[i] then return(1) else if A[i] < B[i] then return(0)
else if i=0 and A[0] = B[0] then return(0)
}
End
• Main program: Compare(A[7..0], B[7..0]);
• Same problem of sequentialization – higher-order bit compared before next lowerorder bit and so on, leading to a linear delay in # of bits (as opposed to log n with
parallelism)
F
Logic for selecting one of the comparator o/ps corresponding to the 1 st comparator from the left that has st=0
f(7)
f(6)
f(5)
f(4)
f(3)
f(2)
f(1)
f(0)
st en
st en
st en
st en
st en
st en
1-bit
1-bit st en
1-bit
1-bit
1-bit
1-bit
1-bit
1-bit
comparator
comparator
comparator
comparator
comparator
comparator
comparator
comparator
A[7] B[7]
A[6] B[6]
A[5] B[5]
A[4] B[4]
A[3] B[3]
A[2] B[2]
A[1] B[1]
A[0] B[0]
Concurrent Statements & Component Instantiations in VHDL
• Parallelism or concurrency needs to be explicitly specified by an HDL—a synthesis
tool will mostly not be able to extract any parallelism from a description (i.e.,coding)
that does not explicitly expose the parallelism
• VHDL specifies concurrency using concurrent statements
• VHDL specifies iterative and recursive specifications of concurrency using iterative
generate statements and conditional generate statements
a
b
A
C
c
d
B
• In the simple ckt to the left OR gates A
and B are supposed to operate in
parallel/concurrently
x
z
• No way to specify this in a s/w prog. lang
(or in a VHDL behavioral dsecription)
y
VHDL Description:
entity simple_ckt is
port(a, b, c, d: in std_logic; z”: out std_logic); -- like procedure input/output variable declarations
end entity simple_ckt; -- above are input/output ports or wires
architecture data_flow of simple_ckt is
signal x, y : std_logic; -- declaring internal wires
begin
x <= a or b; -- in-built OR function
-- may also have “x <= a or b after 2 ns;” for delay
-- and similarly for the other assign. statements
y <= c or d; -- order of these statements do
z <= x or y; -- not matter; all operate concurrently
end architecure;
OR
architecture structural of simple_ckt is
signal x, y : std_logic;
begin
or_A: entity work.or_gate(behav) -- predefined OR gate
port map (a, b, x); -- this is “direct instantiation”
or_B: entity work.or_gate(behav)
port map(c, d, y); -- order of instant’n doesn’t matter
Or_C: entity work.or_gate(behav) port map (x, y, z);
end architecure;
D&C-based Comparator Design Description using VHDL
VHDL Description – Generate, Recursion, Concurrency
A
Comp. A[7..0]],B[7..0]
entity tree_comparator is
A1
generic (n: natural) – parameterizes the design size
port(A, B: in std_logic_vector(n-1 downto 0); f: out Comp A[7..4],B[7..4]
std_logic_vector(0 to 1));
end entity tree_comparator;
architecture struct_recursive of tree_comparator is
signal f1, f2 : std_logic_vector(0 to 1);
begin
simpl_comp: if n = 1 generate
begin
Leaf_comp: entity work.one_bit_comp(behav)
port map (A(n-1), B(n-1), f);
end generate simpl_comp;
compound_comp: if n > 1 generate
begin
comp1: entity work.tree_comparator(recursive)
generic map (n/2)
port map (A(n-1 downto n/2), B(n-1 downto n/2), f1);
comp2: entity work.tree_comparator(recursive)
generic map (n/2)
port map (A(n/2 - 1 downto 0), B(n/2 -1 downto 0), f2);
mux_2bit: entity mux_two_to_one(behav)
generic map (2) -- # of bits
port map (f1, f2, f1(1), f); – f1 & f2 are 2-bit data i/ps,
-- f1(1) is the 1-bit select, f is the 2-bit output
end generate compound_comp;
end architecure struct_recursive;
A2
If A1 reslt is
> or < take
A1 reslt else
take A2 reslt
Comp A[3..0],B[3..0]
2
f
1-bit comp.
a
b
f
out
sel
f1(1)
2
2-bit
2:1 Mux
I0
2
I1
2
f1
f2
f
f
(n/2)-bit comp.
(n/2)-bit comp.
A
A
A(n-1..n/2)
B
B(n-1..n/2)
B
A(n/2 -1..0) B(n/2 -1..0)
D&C-based Comparator Design Description using VHDL (contd.)
Component Descriptions:
f
entity one_bit_comp is
port (a, b: in std_logic; f: out std_logic_vector(0 to 1));
end entity one_bit_comp;
2
1-bit comp.
a
f
out
sel
I0
2
b
architecture behav of one_bit_comp is
foo: process is
-- may have “wait for 10 ns” to specify delay for
-- entire process (and in this case architecture)
if a > b then f(0) <= `1’; f(1) <= `0’;
elsif a < b then f(0) <= `0’; f(1) <= `0’;
else f(0) <= `0’; f(1) <= `1’;
endif;
end process foo;
end architecture;
entity mux_two_to_one is
generic (width: natural)
port (I0, I1: in std_logic_vector(0 to width-1); sel: in
std_logic;
f: out std_logic_vector(0 to width-1));
end entity mux_two_to_one;
2
2-bit
2:1 Mux
I1
2
architecture behav of mux_two_to_one is
foo: process is
-- may have “wait for 5 ns” to specify delay for process
if sel = `0’ then f(0 to width-1) <= I0(0 to width-1);
elsif sel = `1’ then f(0 to width-1) <= I1(0 to width-1);
endif;
end process foo;
end architecture behav;
Summary
•
For complex digital design, we need to think of the “computation” underlying
the design in an algorithmic and high-level manner:
– is it amenable to the D&C approach (i.e., can be broken into smaller-sized problems
whose outputs can be stitched-up)?
– are there properties of this computation that can be exploited for faster, less
expensive, modular design
•
•
•
•
•
•
The design is then developed in a D&C manner & the corresponding circuit
may be synthesized by describing it compactly using a structural HDL form
For an operation/func x on n operands (an-1 x an-2 x …… x a0 ) if x is
associative, the D&C approach gives an “easy” stitch-up function, which is x
on 2 operands (o/ps of applying x on each half). This results in a treestructured circuit with (log n) delay instead of a linearly-connected circuit with
(n) delay can be synthesized.
If x is non-associative, more ingenuity and determination of properties of x is
needed to determine the break-up of the function and the stitch-up function.
The resulting design may or may not be tree-structured
A hardware description language with a structural form is useful to describe
large circuits with all the designed parallelism, and then have them
synthesized automatically.
VHDL provides special hardware-oriented constructs for the description of
hardware that is not available in regular sequential s/w programming
languages: especially, concurrency (via data flow or instantiation statements)
and circuit-delay specifications.
VHDL also has constructs that ease the description of regular-patterned
circuits (linear arrays, multi-dimensional arrays, regular trees, etc.) of arbitrary
size: generate statements and recursion.
Copyright Notice for RASSP Slides
(material included w/ explicit acknowledgement in next few slides)
Structural VHDL allows the designer to represent a system in terms of components and their
interconnections. This module discusses the constructs available in VHDL to facilitate structural
descriptions of designs.
Generate Statement --- from RASSP slides
• Structural descriptions of large, but highly regular, structures can be tedious. A VHDL
GENERATE statement can be used to include as many concurrent VHDL statements (e.g.
component instantiation statements) as needed to describe a regular structure easily.
• In fact, a GENERATE statement may even include other GENERATE statements for more
complex devices.. Some common examples include the instantiation and connection of multiple
identical components such as half adders to make up a full adder, or exclusive or gates to create a
parity tree.
Generate Statement – For Scheme --- from RASSP slides
• VHDL provides two different schemes of the GENERATE statement, the FOR-scheme and the IF-scheme.
• This slide shows the syntax for the FOR-scheme.
• The FOR-scheme is reminiscent of a FOR loop used for sequence control in many programming languages.
• The FOR-scheme generates the included concurrent statements the assigned number of times. In the
FOR-scheme, all of the generated concurrent statements must be the same.
• The loop variable is created in the GENERATE statement and is undefined outside that statement (i.e. it is
not a variable or signal visible elsewhere in the architecture).
•.The loop variable in this FOR-scheme case is N. The range can be any valid discrete range. After the
GENERATE keyword, the concurrent statements to be generated are stated, and the GENERATE statement
is closed with END GENERATE.
Generate Statement – For Scheme Example --- from RASSP slides
• This slide shows an example of the FOR-scheme.
• The code generates an array of AND gates. In this case, the GENERATE statement has been named
G1 and instantiates an array of 8 and_gate components.
• The PORT MAP statement maps the interfaces of each of the 8 gates to specific elements of the S1,
S2, and S3 vectors by using the FOR loop variable as an index.
Generate Statement – If Scheme --- from RASSP slides
• The second form of the GENERATE statement is the IF-scheme. This scheme allows for conditional
generation of concurrent statements.
• One obvious difference between this scheme and the FOR-scheme is that all the concurrent
statements generated do not have to be the same.
• While this IF statement may seem reminiscent to the IF-THEN-ELSE constructs in programming
languages, note that the GENERATE IF-scheme does not provide ELSE or ELSIF clauses.
• The Boolean expression of the IF statement can be any valid Boolean expression.
Generate Statement – If Scheme Example --- from RASSP slides
• The example here uses the IF-scheme GENERATE statement to make a modification to the and_gate
array such that the seventh gate of the array will be an or_gate.
• Another example use of the IF-scheme GENERATE is in the conditional execution of timing checks.
Timing checks can be incorporated inside a GENERATE IF-scheme. E.g., the foll. statement can be used:
Check_time : IF TimingChecksOn GENERATE
This allows the boolean variable TimingChecksOn to enable timing checks by generating the appropriate
concurrent VHDL statements in the description. This parameter can be set in a package or passed as a
generic and can improve simulation speed by shutting off this computational section.
Download