Programmable Logic Problem 1 - Consider the function L(A,B,C,D,E) = A'B' + A'BC + A'BDE + AB'C'D' + AB'C'E'. Derive an implementation with as few 3-input LUTs as possible. Show the content of the SRAM cells in each LUT (you should rely on Shannon’s expansion). Problem 2 - Consider the function L(A,B,C,D,E,F) = A'B'C' + A'BC + A'BDE + A'BCF + AB'C + AB. Derive an implementation with as few 4-input LUTs as possible. Show the content of the SRAM cells in each LUT. Problem 3 - Consider the function L(A,B,C,D,E,F) = A'B'CD' + ABCD' + CD'E' + C'D'F + CF'. Derive an implementation with as few 4-input LUTs as possible. Show the content of the SRAM cells in each LUT. Problem 4 - Consider function the L(A,B,C,D,E,F,G) = AB + CD + EFG. Explore the area/delay trade-offs for this function assuming that the implementation is done using 3-input LUTs. What is the number of LUTs for achieving the minimum propagation delay? What is the minimum number of LUTs when the longest propagation delay is not a concern? Show the content of the SRAM cells in each LUT for both cases. Problem 5 – Consider the K(A,B,C,D,E,F) and L(A,B,C,D,E,F) functions shown in Figure 1. Determine how this circuit can be implemented with only two 4-input LUTs. Show the content of the SRAM cells in each LUT. Figure 1 – Circuit for problem 5. Problem 6 – Consider the 4-bit ripple carry adder shown in Figure 2. The sum and carry bits are implemented using one 3-input LUT each. The input carry for the rightmost adder cell is considered to be 0. It is assumed that the propagation delay through one 3-input LUT is 2ns and there are no interconnect delays. We consider that the inputs of the circuit change as follows: A3A2A1A0 switches from 0000 to 0001 and B3B2B1B0 switches from 0000 to 0001. Draw the waveforms on the outputs S3S2S1S0 and output/intermediate carry bits C4C3C2C1. Problem 7 – Repeat problem 6 for the case when A3A2A1A0 switches from 0000 to 0001 and B3B2B1B0 switches from 0000 to 1111. Problem 8 – For the 4-bit ripple carry adder shown in Figure 2 if the output carry bit C4 and the sum bits S3S2S1S0 are buffered into flip-flops and all the inputs are buffered in flip-flops, what is the maximum clock frequency that can be used? Each 3-input LUT has a propagation delay of 2ns and it is assumed that the propagation/hold/ setup times for flip-flops are ignored. Figure 2 – Circuit for problems 6 to 8. Finite State Machines For the rest of this document, all the sequential circuits should be Moore type and it is assumed that flip-flops are positive-edge triggered with an asynchronous reset active low. Problem 9 – Design a sequential circuit that has 1 data input (w) and 1 data output (z). The output z will become 1 only if the input w has been 1 for the last 5 clock cycles. Draw the FSM diagram and write/simulate the Verilog code to verify it. Estimate the resource usage (in terms of logic elements using 4-input LUTs). Problem 10 – Design a sequential circuit that has 1 data input (w) and 1 data output (z). The output z will become 1 only if the input w has been 1010 over the last 4 clock cycles. Draw the FSM diagram and write/simulate the Verilog code to verify it. Problem 11 – Design a sequential circuit that has 1 data input (w) and 1 data output (z). The output z will become 1 if in the last 3 clock cycles the number of 1s on the input w has been greater than 1. Draw the FSM diagram and write/simulate the Verilog code to verify it. Problem 12 – Design a sequential circuit that has 2 data inputs (w1, w0) and 2 data outputs (z1, z0). The output z1 will become 1 only if the two inputs have been identical in the last 2 clock cycles. The output z0 will become 1 only if the two inputs have been complements of each other in the last 2 clock cycles. Draw the FSM diagram and write/simulate the Verilog code to verify it. Problem 13 – Design a sequential circuit that works as a traffic light controller. The circuit does not have any data inputs (the clock and asynchronous reset are obviously available) and it has three data outputs: Red, Green and Yellow. After reset the circuit will activate Red for 200 clock cycles, then it will activate Green for 200 clock cycles and then it will activate Yellow for 20 clock cycles (this sequence will repeat itself). While one output signal is turned on the other two output signals are deactivated. Derive the data-path elements and the FSM in the control-path. Write/simulate the Verilog code to verify the entire design. Problem 14 – Design a home security system as follows. There are 3 input sensors called, e (enable), w (window) and d (door). There is one output called s (alarm sound). After reset the system is disarmed. The system stays in the disarmed state until e is pressed, when it will move to the armed state. In this state, if e is pressed again the system will move back to the disarmed state. If however, w or d sensors are sensed while the system is in the armed state, the system will activate the alarm sound as follows. Signal s will be turned on for 500 clock cycles then it will shut down for 300 clock cycles. This sequence will repeat itself until e becomes 1, in which case the system returns to the disarmed state. Derive the data-path elements and the FSM in the control-path. Write/simulate the Verilog code to verify the entire design. Problem 15 – Design a sequential circuit that tracks the dynamic range in a 16-bit data input, called sample. After an input start signal becomes active, the circuit will update “on-the-fly” the minimum/maximum values observed on the sample input (note: only the values observed after the start signal becomes active will matter when updating the minimum/maximum value). The dynamic range (the difference between the maximum and the minimum values) will be available on the range output. The range will be updated so long as the input stop signal is not activated. After the input stop signal is becomes 1 the circuit will wait for a new start from which point it resumes its behavior as explained before. Derive the data-path elements and the FSM in the control-path. Write/simulate the Verilog code to verify the entire design. Verilog Source Code vs. Equivalent Hardware Circuit Problem 16 – Given the enclosed source code, draw the equivalent hardware circuit. module problem16 (input logic resetn, clock, input logic x, y, z, c1, c2, output logic f, g); always_ff @(posedge clock or negedge resetn) begin if (!resetn) begin f <= 1'b0; end else begin f <= x; if (c1) f <= y; if (c2) f <= z; end end always_comb begin if (c2) g = z; else if (c1) g = y; else g = x; end endmodule Problem 17 – Given the enclosed source code, draw the equivalent hardware circuit. module problem17 (input logic resetn, clock, input logic start, stop, input logic load, serial_in, input logic [7:0] parallel_in, output logic parity); logic shift_left; logic[7:0] shift_register; always_ff @(posedge clock or negedge resetn) begin if (!resetn) begin shift_left <= 1'b0; shift_register <= 8'd0; end else begin if (start) shift_left <= 1'b1; if (stop) shift_left <= 1'b0; if (shift_left) shift_register <= {shift_register[6:0],serial_in}; else if (load) shift_register <= parallel_in; end end always_comb begin parity = ^shift_register; end endmodule Problem 18 – Given the enclosed source code, draw the equivalent hardware circuit. module problem18 (input logic resetn, clock, input logic [7:0] data_in, output logic [2:0] msb); integer i; always_ff @(posedge clock or negedge resetn) begin if (!resetn) msb <= 3'b000; else for (i=0; i<8; i=i+1) if (data_in[i]) msb <= i; end endmodule Problem 19 – Given the enclosed hardware circuit, write/simulate the equivalent Verilog code. Figure 3 – Circuit for problem 19 (the bitwidth for this data path is 4). Problem 20 – Given the enclosed hardware circuit, write/simulate the equivalent Verilog code. Draw the periodic waveform on the output f. If the reference clock frequency is 100MHz, what is the frequency of signal f? Figure 4 – Circuit for problem 20 (the bitwidth for this data path is 9). Problem 21 – Given the enclosed source code, draw the equivalent hardware circuit. Draw also the periodic waveforms on the output signals f and g. If the reference clock is 200 MHz, then what are the frequencies of signals f and g ? module problem21 (input logic resetn, clock, output logic f, g); logic[9:0] counter; always_ff @(posedge clock or negedge resetn) if (!resetn) begin counter <= 10'h000; f <= 1'b0; g <= 1'b0; end else begin f <= 1'b1; if (counter > 10'd50 && counter < 10'd250) f <= 1'b0; g <= 1'b0; if (counter > 10'd150 && counter < 10'd400) g <= 1'b1; if (counter < 10'd900) counter <= counter + 10'd1; else counter <= 10'd0; end endmodule Problem 22 – Consider the periodic waveforms shown in Figure 5. Consider that you have a 50MHz clock signal; design the sequential circuit that generates these waveforms. Find the minimum bitwidth, draw the circuit diagram and write/simulate the equivalent Verilog code. Figure 5 – Waveforms for problem 22 (figure not drawn to scale and “us” = micro-seconds). Problem 23 – Consider the periodic waveform shown in Figure 6. Consider that you have a 50MHz clock signal; design the sequential circuit that generates this waveform. “Assuming” that you have access to oscillators that can generate clock signals of any frequency you wish, what would be the clock period most suitable for minimizing the size of the sequential circuit that generates the waveform from the figure? Find the bitwidths (for both 50 MHz and the optimal frequency that reduces circuit size by minimizing bitwidth), draw the circuit diagrams and write/simulate the equivalent Verilog code. Figure 6 – Waveforms for problem 23 (figure not drawn to scale and “us” = micro-seconds). Problem 24 – Given the enclosed source code, draw the equivalent hardware circuit. module problem24 (input logic c1, c2, c3, a, b, output logic f, g); always_latch begin if (c1) begin f = a; g = b; end else if (c2) begin f = b; end else if (c3) begin g = a; end end endmodule Problem 25 – Given the enclosed source code, draw the equivalent hardware circuit. module problem25 (input logic c1, c2, a, b, output logic f, g); always_latch begin case ({c1,c2}) 2'b00: f 2'b01: f 2'b10: f 2'b11: g endcase end = = = = a | b; b; a; a ~^ b; endmodule Problem 26 – Given the hardware circuit from Figure 7, write/simulate the equivalent Verilog code. Figure 7 – Circuit for problem 26. Problem 27 – Given the hardware circuit from Figure 8, write/simulate the equivalent Verilog code. Figure 8 – Circuit for problem 27. Problem 28 – Given the enclosed source code, draw the equivalent hardware circuit. The logic for loading registers “a”, “b” and “c” is not given in the sample source code. The same simplifying assumption applies to problems 29 and 30. The purpose of these exercises is to better understand the usage of blocking/non-blocking assignments in always_ff blocks. module problem28 (input logic clock, output logic[7:0] f); logic[7:0] a, b, c; always_ff @(posedge clock) begin a = b + c; b = c + a; c <= a + b; end assign f = c; endmodule Problem 29 – Given the enclosed source code, draw the equivalent hardware circuit. module problem29 (input logic clock, output logic[7:0] f); logic[7:0] a, b, c; always_ff @(posedge clock) begin a = b + c; b <= c + a; c = a + b; end assign f = c; endmodule Problem 30 – Given the enclosed source code, draw the equivalent hardware circuit. module problem30 (input logic clock, output logic[7:0] f); logic[7:0] a, b, c; always_ff @(posedge clock) begin a <= b + c; b = c + a; c = a + b; end assign f = c; endmodule Single-port vs. Dual-port Embedded Memories Problem 31 – Consider an array A containing 16 bit signed numbers stored in one embedded memory with 256 locations with 16 bits per location. The array data is stored in “linear” fashion, with element A[i] stored in location “i”. Design a digital circuit that computes the maximum element in the array. This maximum element is stored into a register. Assume that the memory is a single-port RAM with one clock cycle latency. The aim is to reduce the number of clock cycles required to complete the calculation. Write/simulate the Verilog code. Problem 32 – Repeat problem 31 with a dual-port RAM with one clock cycle latency. Problem 33 – For the setup from problem 31, design a digital circuit to compute the average value of the elements in the array. Problem 34 – Repeat problem 33 with a dual-port RAM with one clock cycle latency. Problem 35 – Consider two arrays A and B containing 16 bit signed numbers stored in one memory with 256 locations with 16 bits per location. Both arrays have 128 elements, with A[i] stored in location “i” and B[i] stored in location “128 + i”. Design a digital circuit that computes S[i]=A[i]+B[i] and D[i]=A[i]-B[i] and it replaces A[i] with S[i] and B[i] with D[i]. Assume that the memory is a single-port RAM with one clock cycle latency. The aim is to reduce the number of clock cycles required to complete the calculation. Write/simulate the Verilog code. Problem 36 – Repeat problem 35 with a dual-port RAM with one clock cycle latency. Problem 37 – Consider an array A containing 16 bit signed numbers stored in one memory with 256 locations with 16 bits per location. Element A[i] is stored in location “i”. Design a digital circuit that computes an array B as follows: B[i]=(A[i-1] – 2*A[i] + 4*A[i+1])/4. It is assumed that A[-1] = A[0] and A[256] = A[255]. As soon as it is available, element B[i] will replace A[i] in memory location “i”. Assume that the memory is a single-port RAM with one clock cycle latency. The aim is to reduce the number of clock cycles required to complete the calculation. Write/simulate the Verilog code. Problem 38 – Repeat problem 37 with a dual-port RAM with one clock cycle latency. Problem 39 – Consider an embedded memory with 16 locations with 10 bits per location that will store the first 16 entries of the Fibonacci series. After power-up A[0]=1 and A[1]=1 are assumed to be stored in locations “0” and “1” of the memory. The subsequent 14 entries of the Fibonacci series are computed according to the recurrence equation A[i] = A[i-1] + A[i-2], for “i” from 2 to 15. Design a digital circuit that computes the elements of A[i] and it stores them in location “i” of the memory. Note, the values in the Fibonacci series are unsigned. Assume that the memory is a single-port RAM with one clock cycle latency. The aim is to reduce the number of clock cycles required to complete the calculation. Write/simulate the Verilog code. Problem 40 – Repeat problem 39 with a dual-port RAM with one clock cycle latency. Project-related Problems on Address Generation, Clipping, … Problem 41 – Consider a data memory of 25600 locations with 8 bits per location. It is assumed that a gray scale image of size 160x160 is stored in the memory in “linear” fashion (i.e., value Yi is stored in location “i”). The image is partitioned into 8x8 blocks that are processed one at a time. Each of the 160 rows in the image contains 160 columns as follows. Figure 9 – A 160x160 image partitioned in 8x8 blocks for problem 41. The first block contains samples Y0 … Y7, Y160 … Y167, …, Y1120 … Y1127. The second block contains samples Y8 … Y15, Y168 … Y175, …, Y1128 … Y1135 and so on. Design the address generation circuitry to fetch the data from the memory. How the data is processed is not relevant to this question. Therefore the memory access latency is of no importance. In addition, it is assumed that the blocks are fetched immediately one after another, (i.e., address generation works without any stalls). You are allowed to use datapath elements such as counters, adders, … however no multipliers are permitted. Problem 42 – Repeat problem 41, assuming that the image is partitioned into 16x8 blocks that are processed one at a time. The first block contains samples Y0 … Y15, Y160…Y175, …, Y1120 … Y1135. The second block contains samples Y16 … Y31, Y176 … Y191, …, Y1136 … Y1151 and so on. Figure 10 – A 160x160 image partitioned in 16x8 blocks for problem 41. Problem 43 – Repeat problem 41, assuming that the image is partitioned into 8x16 blocks that are processed one at a time. The first block contains samples Y0 … Y7, Y160 … Y167, …, Y2400 … Y2407. The second block contains samples Y8 … Y15, Y168 … Y175, …, Y2408 … Y2415 and so on. Figure 11 – A 160x160 image partitioned in 8x16 blocks for problem 42. Problem 44 – Consider a 16-bit input signal representing a signed number. Design the clipping circuitry that maps the 16-bit input to a 9-bit output as follows. If the input is greater than +255 then the output is saturated to +255. If the input is smaller than -256 then the output is saturated to -256. In any other case the output equals the input. Problem 45 – Consider a 16-bit input signal representing a signed number. Design the clipping circuitry that maps the 16-bit input to a 10-bit output as follows. If the input is greater than +511 then the output is saturated to +511. If the input is positive and it is smaller than +128 then the output is saturated to +128. If the input is smaller than -512 then the output is saturated to -512. If the input is negative and it is greater than -128 then the output is saturated to -128. In any other case the output equals the input. Problem 46 – Consider that A, B, S are 8x8 matrices with each element a 32-bit signed number. Assume that you can use only dual port RAMs with 64 locations with 32 bits per location. Design the digital circuit that performs the following calculation S = A*B. You are allowed to use two 32-bit signed multipliers. Problem 47 – Consider that A, B, C, S are 8x8 matrices with each element a 32-bit signed number. Assume that you can use only dual port RAMs with 64 locations with 32 bits per location. Design the digital circuit that performs the following calculation S=A*B*C*AT*BT. You are allowed to use one 32-bit signed multiplier. Problem 48 – Repeat problem 47 with two 32-bit signed multipliers. Problem 49 – A block of data 8x8 is multiplied sample by sample with the quantization matrix shown below. It is assumed that a new sample is processed every clock cycle. The samples are processed in the zig-zag order defined in the project description. Derive the circuitry that multiplies the 8x8 block of data with the quantization matrix. Figure 12 – Quantization matrix for problem 49. Problem 50 – Repeat problem 49 using the quantization matrix shown below. Figure 13 – Quantization matrix for problem 50.