Uploaded by Mintu Kumar

LAB5 Mintukumar2022296

advertisement
LAB5
Name : Mintu Kumar
Rollno. : 2022296
group members: Aman Kumar(2022060), Saumya Trivedi(2021198), Pankaj
Yadav(2022347)
Introdunction:
In this lab, I implemented a Finite Impulse Response (FIR) filter on a Xilinx Zynq board using
two different Direct Memory Access (DMA) channels. One DMA uses the Accelerator
Coherency Port (ACP) and the other uses the High-Performance (HP) port. The purpose of
this experiment is to compare the execution times of the FIR filter when it is implemented in
software on the Processing System (PS) versus when it is accelerated in hardware on the
Programmable Logic (PL).
The ACP port provides cache-coherent data transfers between the PS and the PL, which
can help maintain data consistency. On the other hand, the HP port is designed for fast data
transfers with high throughput. By using both, I was able to see how each method performs
in terms of speed and efficiency.
Block Diagram:
Block Diagram with ILA:
LAB5
1
Address Mapping:
LAB5
2
code :
LAB5
3
LAB5
4
// 2) Configure ACP DMA and send the data
XAxiDma DMA_instance;
XAxiDma_Config *DMA_Config = XAxiDma_LookupConfig(XPAR_AXI_DMA_ACP_DEVIC
E_ID);
...
XAxiDma_CfgInitialize(&DMA_instance, DMA_Config);
XPAR_AXI_DMA_ACP_DEVICE_ID is defined in xparameters.h, generated by Vivado.
XAxiDma_CfgInitialize sets up the DMA driver so we can start transfers.
// 3) Start timing for the PL-based FIR via ACP
XTime PL_Start, PL_End;
XTime_SetTime(0);
XTime_GetTime(&PL_Start);
// DMA: PL writes output to FIR_PL_Output
status = XAxiDma_SimpleTransfer(&DMA_instance,(UINTPTR)FIR_PL_Output,FIR_Data_
Size * sizeof(int),XAXIDMA_DEVICE_TO_DMA);
status = XAxiDma_SimpleTransfer(&DMA_instance,(UINTPTR)FIR_input,FIR_Data_Size *
sizeof(int),XAXIDMA_DMA_TO_DEVICE);
// Wait for TX (DMA_TO_DEVICE) to complete
...
// Wait for RX (DEVICE_TO_DMA) to complete
...
XTime_GetTime(&PL_End);
Calls to send the input array to the PL (DMA_TO_DEVICE) and to receive the output
array from the PL (DEVICE_TO_DMA).
Uses status register polling to wait for both the transmit (TX) and receive (RX) to finish.
Times this process with XTime_GetTime to measure the PL execution time in
microseconds.
Similar structure to comp_PSvsACP, but uses the HP port for DMA. :
// 2) Configure HP DMA
XAxiDma DMA_instanceHP;
XAxiDma_Config *DMA_ConfigHP = XAxiDma_LookupConfig(XPAR_AXI_DMA_HP_DEVI
CE_ID);
XAxiDma_CfgInitialize(&DMA_instanceHP, DMA_ConfigHP);
// 3) Flush caches, start timing, do DMA transfers
Xil_DCacheFlushRange((UINTPTR)FIR_input, sizeof(int)*FIR_Data_Size);
Xil_DCacheFlushRange((UINTPTR)FIR_PL_Output, sizeof(int)*FIR_Data_Size);
...
// Wait for TX/RX to complete
...
Xil_DCacheInvalidateRange((UINTPTR)FIR_PL_Output, sizeof(int)*FIR_Data_Size);
Differences
LAB5
5
We explicitly flush the caches before sending data to the PL, and invalidate after
receiving data back. This is often done for non-coherent ports (HP) to ensure correct
data is in memory.
The rest of the steps—timing, checking for mismatches, printing results—are similar to
the ACP test.
output:
AXI Port
PS (Software
PL (Hardware
FIR) Time
FIR) Time
Observations / Reason
ACP is cache-coherent, reducing overhead
ACP
64.984619 µs
5.073846 µs
for small data.- Hardware acceleration on
the PL is significantly faster than software
on the PS.
- HP port requires manual cache
HP
64.510773 µs
7.006154 µs
flush/invalidate.- Slightly more overhead
compared to ACP in this example, but still
much faster than the PS.
LAB5
6
Key Points
1. PS (Software) vs. PL (Hardware) Execution:
In both ACP and HP scenarios, the PL (hardware FIR) is significantly faster than the
PS (software FIR). This highlights the benefit of offloading the FIR filtering to
dedicated hardware in the programmable logic.
2. ACP vs. HP:
The ACP port is cache-coherent, which can reduce data transfer overhead for
relatively small data sets, leading to faster total execution time (about 5 µs in this
example).
The HP port is optimized for high throughput, especially for large data transfers, but
requires explicit cache maintenance (flush/invalidate), adding a bit more overhead
for small data sets (about 7 µs here).
3. Overall Performance:
Both hardware-accelerated (PL) methods are much faster than software running on
the PS.
Choice of ACP vs. HP can depend on data size and cache-coherency needs. For
small data, ACP can be slightly faster due to built-in coherency. For large streaming
data, the HP port may become more efficient.
LAB5
7
LAB5
8
Download