1. Design a single port RAM memory called bsm:
module bsm #(
parameter DATA_WIDTH = 8,
// Data width, default is 8 bits
parameter ADDR_WIDTH = 4
// Address width, default is 4 bits
)(
input wire clk,
// Clock signal
input wire rstn,
// Reset signal, active low
input wire rw,
// Read/Write control signal (0: Read, 1: Write)
input wire en,
// Enable signal (1: Read or Write is enabled)
inout wire [DATA_WIDTH-1:0] data, // Data bus
input wire [ADDR_WIDTH-1:0] addr // Address bus
);
// Memory array definition
reg [DATA_WIDTH-1:0] mem [(1<<ADDR_WIDTH)-1:0];
reg [DATA_WIDTH-1:0] data_out;
reg data_en;
// Bidirectional data bus control
assign data = (data_en && !rw) ? data_out : {DATA_WIDTH{1'bz}}; // Output data on
read, otherwise high impedance
always @(posedge clk or negedge rstn) begin
if (!rstn) begin
// Initialization or other logic during reset (optional)
data_out <= 0;
data_en <= 0;
end else if (en) begin
if (rw) begin
// Write operation
mem[addr] <= data;
end else begin
// Read operation
data_out <= mem[addr];
end
data_en <= 1; // Enable the data bus
end else begin
data_en <= 0; // Disable the data bus
end
end
endmodule
Explanation of the Code:
Parameterization: The module uses parameters DATA_WIDTH and ADDR_WIDTH to define
the data width and address width, making it flexible to adapt to different configurations.
Bidirectional Data Bus: The data bus is implemented as bidirectional. The assign statement
and data_en control signal manage the direction. When performing a read operation, data_out
is driven onto data. In other cases, data is set to high impedance.
Read/Write Logic:
When rw is 1 and en is enabled, a write operation is performed, storing the value of data at the
memory location specified by addr.
When rw is 0 and en is enabled, a read operation is performed, loading the data from the
memory location specified by addr into data_out for output on the data bus.
Enable Signal (en): This signal controls whether a read or write operation is executed. If en is
low, no operation occurs, and the data bus remains disabled.
2. Design matrix multiplication and Design averaging:
module matrix_multiplication_and_averaging (
input wire clk,
input wire rstn,
input wire [7:0] matrix_a[3:0][3:0], // 4x4 Matrix A, 8-bit unsigned integers
input wire [7:0] matrix_b[3:0][3:0], // 4x4 Matrix B, 8-bit unsigned integers
output reg [15:0] matrix_c[3:0][3:0], // 4x4 Matrix C for results of multiplication
output reg [15:0] matrix_p[1:0][1:0] // 2x2 Matrix P for averaged results
);
integer i, j, k;
// Matrix Multiplication: C = A * B
always @(posedge clk or negedge rstn) begin
if (!rstn) begin
// Reset all elements in matrix_c to zero
for (i = 0; i < 4; i = i + 1)
for (j = 0; j < 4; j = j + 1)
matrix_c[i][j] <= 16'd0;
end else begin
// Perform multiplication
for (i = 0; i < 4; i = i + 1) begin
for (j = 0; j < 4; j = j + 1) begin
matrix_c[i][j] <= 0; // Initialize the result cell
for (k = 0; k < 4; k = k + 1) begin
matrix_c[i][j] <= matrix_c[i][j] + (matrix_a[i][k] *
matrix_b[k][j]);
end
end
end
end
end
// Averaging 2x2 sub-matrix blocks from Matrix_C to form Matrix_P
always @(posedge clk or negedge rstn) begin
if (!rstn) begin
// Reset Matrix_P to zero
matrix_p[0][0] <= 16'd0;
matrix_p[0][1] <= 16'd0;
matrix_p[1][0] <= 16'd0;
matrix_p[1][1] <= 16'd0;
end else begin
// Averaging each 2x2 block in Matrix_C to get Matrix_P
matrix_p[0][0] <= (matrix_c[0][0] + matrix_c[0][1] + matrix_c[1][0] +
matrix_c[1][1]) >> 2;
matrix_p[0][1] <= (matrix_c[0][2] + matrix_c[0][3] + matrix_c[1][2] +
matrix_c[1][3]) >> 2;
matrix_p[1][0] <= (matrix_c[2][0] + matrix_c[2][1] + matrix_c[3][0] +
matrix_c[3][1]) >> 2;
matrix_p[1][1] <= (matrix_c[2][2] + matrix_c[2][3] + matrix_c[3][2] +
matrix_c[3][3]) >> 2;
end
end
endmodule
Explanation:
Matrix Multiplication (Matrix_C Calculation):
The nested loops compute each element in Matrix_C by summing the products of the
corresponding rows of Matrix_A and columns of Matrix_B.
The results are stored in a 4x4 matrix_c with 16-bit width to accommodate the multiplication
results.
Averaging 2x2 Sub-matrices (Matrix_P Calculation):
Each 2x2 block in matrix_c is averaged to create a single element in matrix_p.
The sum of each 2x2 sub-matrix is right-shifted by 2 (divided by 4) to compute the average.
This results in a 2x2 matrix_p, where each element is the average of a corresponding 2x2
block in matrix_c.
3. Design a circuit module called mma
(1) Source codes of DUT
module mma (
input wire clk,
// Clock signal
input wire rstn,
// Active low reset
input wire kick_start,
// Start operation signal
output reg done,
// Operation complete signal
output reg rd_req,
// One-cycle pulse for Read request
input wire rd_ack,
// Read acknowledge
input wire [15:0] rd_data,
// Read data bus
output reg [7:0] rd_addr,
// Read address
input wire [7:0] rd_addr_base,
// Read base address
output reg wr_req,
// One-cycle pulse for Write request
input wire wr_ack,
// Write acknowledge
output reg [15:0] wr_data,
// Write data bus
output reg [7:0] wr_addr,
// Write address
input wire [7:0] wr_addr_base
// Write base address
);
// State definitions
localparam IDLE
= 3'b000,
READ_MATRIX_A = 3'b001,
READ_MATRIX_B = 3'b010,
CALC_MATRIX_C = 3'b011,
CALC_MATRIX_P = 3'b100,
WRITE_RESULT
= 3'b101,
DONE
= 3'b110;
reg [2:0] state, next_state;
// Current and next state
reg [15:0] Buffer_A [15:0];
// Flattened 4x4 Matrix A
reg [15:0] Buffer_B [15:0];
// Flattened 4x4 Matrix B
reg [15:0] Buffer_C [15:0];
// Flattened 4x4 Matrix C
reg [15:0] Buffer_P [3:0];
// Flattened 2x2 Matrix P
reg [3:0] i, j, k;
// Loop indices for operations
reg rd_req_enable, wr_req_enable; // Internal enable signals for requests
// Read/Write request counter
reg [1:0] rd_req_count;
// Counter to manage repeated rd_req pulses
reg [1:0] wr_req_count;
// Counter to manage repeated wr_req pulses
// State transition logic
always @(posedge clk or negedge rstn) begin
if (!rstn)
state <= IDLE;
else
state <= next_state;
end
// State machine logic
always @(*) begin
// Default assignments
next_state = state;
done = 0;
rd_req_enable = 0;
wr_req_enable = 0;
case (state)
IDLE: begin
if (kick_start)
next_state = READ_MATRIX_A;
end
READ_MATRIX_A: begin
rd_req_enable = 1; // Enable rd_req generation
rd_addr = rd_addr_base + i * 4 + j;
if (rd_ack) begin
Buffer_A[i * 4 + j] = rd_data;
if (i == 3 && j == 3)
next_state = READ_MATRIX_B;
end
end
READ_MATRIX_B: begin
rd_req_enable = 1; // Enable rd_req generation
rd_addr = rd_addr_base + 16 + i * 4 + j;
if (rd_ack) begin
Buffer_B[i * 4 + j] = rd_data;
if (i == 3 && j == 3)
next_state = CALC_MATRIX_C;
end
end
CALC_MATRIX_C: begin
for (i = 0; i < 4; i = i + 1)
for (j = 0; j < 4; j = j + 1) begin
Buffer_C[i * 4 + j] = 0;
for (k = 0; k < 4; k = k + 1)
Buffer_C[i * 4 + j] = Buffer_C[i * 4 + j] +
(Buffer_A[i * 4 + k] *
Buffer_B[k * 4 + j]);
end
next_state = CALC_MATRIX_P;
end
CALC_MATRIX_P: begin
Buffer_P[0] = (Buffer_C[0] + Buffer_C[1] + Buffer_C[4] +
Buffer_C[5]) >> 2;
Buffer_P[1] = (Buffer_C[2] + Buffer_C[3] + Buffer_C[6] +
Buffer_C[7]) >> 2;
Buffer_P[2] = (Buffer_C[8] + Buffer_C[9] + Buffer_C[12] +
Buffer_C[13]) >> 2;
Buffer_P[3] = (Buffer_C[10] + Buffer_C[11] + Buffer_C[14] +
Buffer_C[15]) >> 2;
next_state = WRITE_RESULT;
end
WRITE_RESULT: begin
wr_req_enable = 1; // Enable wr_req generation
wr_addr = wr_addr_base + i * 2 + j;
wr_data = Buffer_P[i * 2 + j];
if (wr_ack) begin
if (i == 1 && j == 1)
next_state = DONE;
end
end
DONE: begin
done = 1;
next_state = IDLE;
end
default: next_state = IDLE;
endcase
end
// Read request generation (one-cycle pulse)
always @(posedge clk or negedge rstn) begin
if (!rstn) begin
rd_req <= 0;
rd_req_count <= 0;
end else if (rd_req_enable && rd_req_count == 0) begin
rd_req <= 1;
rd_req_count <= rd_req_count + 1;
end else if (rd_ack) begin
rd_req <= 0;
rd_req_count <= 0; // Reset counter when ack received
end else begin
rd_req <= 0;
end
end
// Write request generation (one-cycle pulse)
always @(posedge clk or negedge rstn) begin
if (!rstn) begin
wr_req <= 0;
wr_req_count <= 0;
end else if (wr_req_enable && wr_req_count == 0) begin
wr_req <= 1;
wr_req_count <= wr_req_count + 1;
end else if (wr_ack) begin
wr_req <= 0;
wr_req_count <= 0; // Reset counter when ack received
end else begin
wr_req <= 0;
end
end
// Address and loop counter logic
always @(posedge clk or negedge rstn) begin
if (!rstn) begin
i <= 0;
j <= 0;
end else if ((rd_ack) && (state == READ_MATRIX_A || state ==
READ_MATRIX_B)) begin
if (j < 3)
j <= j + 1;
else begin
j <= 0;
if (i < 3)
i <= i + 1;
else
i <= 0;
end
end else if ((wr_ack) && (state == WRITE_RESULT)) begin
if (j < 1)
j <= j + 1;
else begin
j <= 0;
if (i < 1)
i <= i + 1;
else
i <= 0;
end
end else if (state == CALC_MATRIX_C || state == CALC_MATRIX_P || state ==
IDLE) begin
i <= 0;
j <= 0;
end
end
endmodule
(2) Source codes of testbench
module mma_tb;
reg clk;
reg rstn;
reg kick_start;
wire done;
wire rd_req;
reg rd_ack;
reg [15:0] rd_data;
wire [7:0] rd_addr;
reg [7:0] rd_addr_base;
wire wr_req;
reg wr_ack;
wire [15:0] wr_data;
wire [7:0] wr_addr;
reg [7:0] wr_addr_base;
// Instantiate the DUT (Device Under Test)
mma uut (
.clk(clk),
.rstn(rstn),
.kick_start(kick_start),
.done(done),
.rd_req(rd_req),
.rd_ack(rd_ack),
.rd_data(rd_data),
.rd_addr(rd_addr),
.rd_addr_base(rd_addr_base),
.wr_req(wr_req),
.wr_ack(wr_ack),
.wr_data(wr_data),
.wr_addr(wr_addr),
.wr_addr_base(wr_addr_base)
);
// Clock generation
always #5 clk = ~clk;
// Simulated memory
reg [15:0] SystemMemory_1 [0:63];
reg [15:0] SystemMemory_2 [0:15];
initial begin
// Initialize testbench signals
clk = 0;
rstn = 0;
kick_start = 0;
rd_ack = 0;
wr_ack = 0;
rd_addr_base = 8'd0;
wr_addr_base = 8'd0;
// Reset and initialize memory
initialize_memory();
#20 rstn = 1;
// Start MMA operation
#10 kick_start = 1;
#10 kick_start = 0;
// Run memory simulation
fork
simulate_read();
simulate_write();
join
// Wait for completion
wait(done);
// Verify results
verify_results();
$finish;
end
// Task to initialize memory with test data
task initialize_memory;
integer i, j;
begin
for (i = 0; i < 4; i = i + 1)
for (j = 0; j < 4; j = j + 1) begin
SystemMemory_1[rd_addr_base + i * 4 + j] = i + j + 1;
SystemMemory_1[rd_addr_base + 16 + i * 4 + j] = (i + 1) * (j + 1);
end
end
endtask
// Simulate read operation
task simulate_read;
begin
forever @(posedge clk) begin
if (rd_req) begin
#15 rd_ack = 1;
// Simulate read delay
rd_data = SystemMemory_1[rd_addr]; // Provide data
#10 rd_ack = 0;
// Clear rd_ack
end
end
end
endtask
// Simulate write operation
task simulate_write;
begin
forever @(posedge clk) begin
if (wr_req) begin
#15 wr_ack = 1;
// Simulate write delay
SystemMemory_2[wr_addr] = wr_data; // Write data to memory
#10 wr_ack = 0;
// Clear wr_ack
end
end
end
endtask
// Task to verify the results in memory
task verify_results;
integer i, j;
begin
$display("Verifying results...");
for (i = 0; i < 2; i = i + 1) begin
for (j = 0; j < 2; j = j + 1) begin
$display("Result [%0d][%0d]: %0d", i, j,
SystemMemory_2[wr_addr_base + i * 2 + j]);
end
end
end
endtask
endmodule
(3) Synthesis from Vivado
(4) waveforms showing the working process
(5) Design specification with circuit architecture, FSM, explanation of key design
Code Explanation:
State Machine:
The module uses a finite state machine (FSM) to control the operation, with several states:
IDLE, READ_MATRIX_A, READ_MATRIX_B, CALC_MATRIX_C, CALC_MATRIX_P,
WRITE_RESULT, and DONE.
Each state performs specific tasks, such as reading data for Matrix_A and Matrix_B from
system memory, calculating Matrix_C and Matrix_P, and writing the results back to memory.
Read/Write Request Timing Control:
The rd_req and wr_req signals are used to control read and write requests, respectively.
According to the timing requirements, there is a 1–3 clock cycle delay for rd_ack and wr_ack.
During the READ_MATRIX_A and READ_MATRIX_B states, rd_req is asserted, and data is
received upon rd_ack.
In the WRITE_RESULT state, wr_req is asserted, and data is sent to memory upon receiving
wr_ack.
Matrix Operations:
In the CALC_MATRIX_C state, matrix multiplication is performed to compute Matrix_C by
multiplying Matrix_A and Matrix_B.
In the CALC_MATRIX_P state, each 2x2 sub-matrix of Matrix_C is averaged to generate
Matrix_P.
Operation Completion Signal:
The done signal is asserted for one clock cycle after all operations are completed, indicating
the operation is finished. The FSM then returns to the IDLE state, waiting for the next
kick_start signal to initiate a new operation.
Detailed FSM State Descriptions:
IDLE: The module waits in the idle state until the kick_start signal is received. Once received,
it transitions to the READ_MATRIX_A state.
READ_MATRIX_A: In this state, rd_req is asserted to read data from the base address
rd_addr_base to fetch Matrix_A row by row. The data is stored in buffer_A.
READ_MATRIX_B: Similar to READ_MATRIX_A, this state reads data from the memory
to populate Matrix_B. The base address is offset to avoid overlap with Matrix_A.
CALC_MATRIX_C: Performs matrix multiplication (C = A * B). Each element in Matrix_C
is calculated by summing the products of corresponding elements in the rows of Matrix_A
and columns of Matrix_B.
CALC_MATRIX_P: Computes Matrix_P by averaging each 2x2 block in Matrix_C. The
averaged values are stored in the 2x2 Buffer_P.
WRITE_RESULT: In this state, wr_req is asserted to write Matrix_P back to system memory
at the address specified by wr_addr_base. Each element of Matrix_P is written sequentially.
DONE: The done signal is asserted for one clock cycle, signaling the end of the operation.
The FSM then returns to IDLE to await the next kick_start signal.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )