ELEC692n VLSI Signal Processing Architecture assignment #2 (Due on Oct. 23) 1. Unfold the DFG in Fig 1 using unfolding factors 2 and 5. 3D A 10D B 3D A C D B C 7D E D 16D Fig. 1 The DFGs for problem 1. 2. Our objective in this problem is to prove that the critical path of a J-unfolded DFG is a monotonically non-decreasing function with respect to J. To show this, prove that the critical path of a J-unfolded DFG is greater than or equal to the critical path of the (J-1)-unfolded DFG. 3. In this problem, we wish to show that changing the ordering of retiming and unfolding is immaterial when we wish to minimize the critical path of the unfolded DFG. Perform (a) and (b) for the DFG in Fig. 2 : a) Unfold the DFG with unfolding factor J=2, and then retime the unfolded DFG to minimize the clock period. ' b) Let the retiming function determined in part (a) of this problem be denoted r . Let the retiming function r be r Ar ' A0r ' A1 ; r B r ' B 0 r ' B1 ; ' ' ' ' r C r C 0 r C 1 ; r D r D 0 r D1 . Show that retiming the DFG in Fig. 2 using r results in a DFG that, when unfolded, achieves the same critical path as the result of part (a) of this problem. Do this by showing that if D U ,V c , then W U ,V r V r U J holds for all pairs of nodes U , V in the retimed version of the original DFG. (20) B 3D A (10) C (20) D (10) Fig. 2 The DFG for problem 3. 4. Consider the DFG in Fig.3. The numbers in parentheses are the computation times of the nodes. a) What is the iteration bound of this DFG? What is the actual iteration period? b) Retime this DFG to minimize the iteration period. What is the actual iteration period of the retimed DFG? c) Unfold both the original DFG and the retimed DFG by a factor of 2. What are their actual iteration periods? d) Determine the minimum unfolding factor J such that the J-unfolded DFG (unfold from the original DFG) can be retimed so that the critical path of this unfolded DFG is J T , where T is the iteration bound of the original DFG in Fig. 3. Unfold the DFG by this minimum unfolding factor and retime the unfolded DFG so that its critical path is J T . (8) E (2) (10) A B D D 3D D (6) Fig. 3 The DFG for problem 4. 5. Consider the 6-tap FIR filter 5 y i h ix n i i 0 C (4) implemented using data-broadcast form shown in Fig. 4. This filter is implemented using folding factor 2 with folding set S MA5, MA4 , S MA3, MA2 , S MA1, MA0 0 1 2 a) Design the folded architecture. b) Construct a schedule corresponding to the folded architecture and verify that the folded architecture generates the desired filter output samples. x(n) h5 0 h4 X + D h3 X + D + (S0/0) (S0/1) D h1 X + MA3 MA2 (S1/0) (S1/1) MA4 MA5 h2 X D h0 X + D X + D MA0 MA1 (S2/0) (S2/1) Fig. 4 A 6-tap data-broadcast FIR filter for problem5. 6. The goal of this problem is to fold the lattice filter shown in Fig.5 using the folding set description shown in the figure. Assume the multiply operations to be mapped to multiply hardware operations pipelined by 2 stages and assume the add operations to be mapped to 1-stage pipeline adders. The hardware architecture needs to be clocked with a clock period 1 u.t. a) Systematically perform retiming for folding so that all folded edge delays are nonnegative. b) Fold the retimed DFG. IN A2(SA1/0) + M2(SM1/1) X X A1(SA1/1) + D M3(SM2/0) X M4(SM2/1) M1(SM1/0) D X X + + A3(SA2/0) M5(SM3/1) OUT A4(SA2/1) Fig. 5 The lattice filter used in problem 6. SA1={A2,A1} SA2={A3,A4} SM1={M1,M2} SM2={M3,M4} SM3={ ,A5} 7. Dynamic programming (DP) has been used to solve problems in communications and controls, artificial intelligence, and operations research, etc. The DP problems have the property that the optimum solution from an initial iteration to the iteration i j must consist of the optimum solution from initial iteration to iteration i , and from iteration i to iteration i j . In signal processing, DP is frequently used in y(n) Viterbi decoders in communication systems, and in hidden Markov models based speech recognition systems [15]. Consider the N -state DP problem given by xi (n 1) max[ x j (n) a ji (n)], i, j 1, 2, j , N, (6.9) Where xi (n) is the value for state i in the n -th iteration, and variable a ji are referred to as the trellis or path coefficients. The fundamental operation in (6.9) is add-compare-select (ACS). An N -state DP problem require N 2 ACS operations in order to update the N state values. The DFG for N =4 is show in Fig. 6, which corresponds to a ring systolic structure. The coefficient a ji is stored in node A ji , and the DFG is wrapped around with edge i on the right connected to the edge on the left with the same number. (a) Write down the computations performed by all ACS units in the DFG in Fig. 6 during n -th iteration, and verify that this DFG updates x1 (n 1) , x2 (n 1) , x3 (n 1) , and x4 (n 1) along the first, second, third and fourth column, respectively. (b) Fold the DFG using the folding factor 4 and following folding sets: S1 { A41 , A31 , A21 , A11} S2 { A12 , A42 , A32 , A22 } S3 { A23 , A13 , A43 , A33} S4 { A34 , A24 , A14 , A44 }. (Hint: You should get a ring systolic structure.) (c) Fold the DFG using the folding factor 8 and the following folding sets: S1 { A41 , A23 , A31 , A13 , A21 , A43 , A11 , A33} S2 { A12 , A34 , A42 , A24 , A32 , A14 , A22 , A44 }. (d) Fold all the nodes in the DFG onto one processing element using folding factor 16 and the following folding set: S1 { A41 , A12 , A23 , A34 , A31 , A42 , A13 , A24 , A21 , A32 , A43 , A14 , A11, A22 , A33 , A44}. Fig. 6 The DFG for the DP used in Problem 7.