PROJECT – VIDEO MANIPULATOR (BASED ON ZED BOARD) MID-SEMESTER PRESENTATION Yakir Peretz Idan Homri Supervisor - Rolf Hilgendorf Semester - winter 2014 Duration - one semester AGENDA 1. Project goals 2. Component description 3. Data flow 4. Required tests and check points 5. Clock definitions 6. Software description 7. Critical issues and solutions 8. Complete Program (without Uart) PROJECT GOALS Creating a system that enables reading images from an external device, saving it in the memory and displaying it by RGB. Creating a programmable logic design that will handle the transportation of the data from the main memory to the VGA output via video direct mapped accessed (VDMA) component. agenda COMPONENT DESCRIPTION -ZYNQ In the design we use the following components: ZYNQ processor the ZYNQ is actually the PS part of the design, which means all the software programmable part. This part is very powerful and includes many features, but we use the following: UART connection memory controller the memory itself - DDR3 One ARM processor –CORTEX A9 All the needed interface connections to the other components in the PL side are built in. All the clocks of the design are generated by the zynq, and given to the relevant components. PS side overview COMPONENT DESCRIPTION -VDMA VDMA The VDMA is the core of the PL side of the design. It is responsible for the transportation of the data from the memory to the stream part. It is connected to three other components via three buses: 1. To the processor via AXI4-lite – to get data regarding the address and size of the data to get from the memory. 2. To the memory controller via full AXI4 – to get the data from. 3. To the “stream_to_video_out” via AXI4-stream – to send the data to. the data transportation to the stream part is done with respect to the VTC timing signals. VDMA COMPONENT DESCRIPTION - VTC Video timing controller this component is responsible for timing the data transfer from the VDMA to the stream to video out component. It generated signals regarding the vertical data transfer (line count) and the horizontal data transfer (pixels per line) as well as the active video signal. It works with a clock that is set in order to fit the data size and rate of pictures per second - Clock definitions Video timing controller COMPONENT DESCRIPTION - STREAM Stream to video out – AXI4-Stream to Video Out core converts AXI4-Stream Video protocol from Xilinx video processing cores such as VDMA, that use this protocol, to video output with explicit sync and timing such as the unit we built to communicate with the VGA port. In our project, the unit is used to convert the output of the VDMA in AXI_stream protocol to an actual video protocol that consists of: 1. Active data signal 2. Vertical sync and horizontal sync 3. Blank periods “Stream to video out” interface COMPONENT DESCRIPTION – RGB_OUT RGB_out this unit was built by us, to convert the data from 8 bit per color (for red green and blue) to 4 bit per color. The output of this unit is the input of the VGA ports RGB – 4 bits per color Vsync Hsync. agenda RGB_out DATA FLOW Step 1 : Sending the data from an external device to the uart. (bitmap to pixels only) Step 2 : Extracting the data from the Uart and saving it to the memory. Step 6: The data is transferred from the STVO to a RGB_out component, in order to fit VGA port. Step 5: The VDMA sends the data to the “stream to video out” unit, with respect to the VTC timing. agenda Step 3: The ZYNQ processor triggers the VDMA by sending the start address and the size of the data stored in memory, on an AXI_LITE bus. Step 4: The data is being transferred to the VDMA via memory controller and saved in a frame buffer. REQUIRED TESTS AND CHECK POINTS We have some strategic check points for validating our design: Uart to memory – we first check that the data we delivered from an external device true the Uart is saved in the memory where we wanted it to be saved. Memory to VDMA – we check that the data is transferred correctly from the memory to the frame bufers inside the VDMA. VDMA to “stream_to_video_out” – we check that the data is transferred correctly from the VDMA to the stream to video out by reading the data runs on the AXI_stream bus. Control signals – we need to check that the “video_timing_controller” is sending the timing signals as we assumed it will. “stream_to_video_out” to VGA – we check if the data from the “stream_to_video_out” is sent as we wanted in a 24 bit (8 bit per color and 3 colors R,G,B) format. VGA output – we need to check that the data in the output of the VGA component is the picture we delivered. This should be displayed on the screen. Block Diagram agenda CLOCK DEFINITIONS There are 2 main clocks in the design (beside the ARM clock & DDR clock) The faster clock is used for the AXI4_lite bus that connects the ARM and the VDMA. On that bus the ARM transfers the data regarding the address and the size of the picture in the memory. the clock is set to 200MHZ The slower clock is used for the full AXI4 bus and the AXI_stream bus. On that bus we move the data from the memory to the VDMA and then from the VDMA to the “stream to video out” unit. That clock is defined to be 148.5 mega pixels per second. That is calculated to fit the amount of data being transferred in one second, calculated as: (number of lines including blank)*(number of pixels per line including blank)*(number of pictures per second) – for us - 2200*1125*60 = 148.5[MHz] In order to fit to the screen in the lab we needed a 1080*1920, and there are 60 pictures per second. (the sizes represent pixels). agenda Component description - VTC BLOCK DIAGRAM VDMA sub system ZYNQ sub system Data flow Required tests and check points PS SIDE OVERVIEW To VDMA via Axi lite To VDMA via AXI-4 Data flow Component description -ZYNQ Pin to Pin “STREAM TO VIDEO OUT” INTERFACE From VDMA To RGB out From video timing controller Data flow Component description - stream VIDEO TIMING CONTROLLER For write channel – Not in use Optionally – can be controlled by the processor. Not in use Data flow Output timing signals for the stream_to_video_out unit Component description - VTC RGB_OUT Data – 4 bits per color Input Sync signals Output Sync signals Data – 8 bit per color Component description – RGB_out Data flow VDMA Connected to memory on a full AXI4 bus. Required for data transfer This is the connection to the processor. It Transports data regarding the address and the size of the picture Connected to STVO on an AXI4 stream bus. Required for data transfer Those are the 3 clocks of the design Component description -VDMA SOFTWARE DESCRIPTION Software Flowchart Host - Matlab Rescaling of BMP Image, open Uart for writing and sending the Image. ZedBoard – C code SDK Image initialization – White image Vdma Configuration and Setup Vdma Start Transfer agenda Load the incoming Image from Uart into DDR. Start transmission of new Image. CRITICAL ISSUES AND SOLUTIONS Solution uart issues Problem Uart Buffer is limited and Uninitialized 1 Hardware – Connect reset to active low, and clocken const 1. Stream To Video Out – Always Output 0. Software – Parking on frames 2 Hardware – Configure Peripherial clock to 2200*1125*60 = 148.5[MHz] Screen requires format picture of 1080*1920 60[Hz] 3 Software – disable caches Image with “Noise” - Inconsistent writing problem 4 SDK loads drivers only for components that connected directly to ZYNQ with AXI-lite. Video Timing Controller driver isn’t loaded into SDK 5 The RGB_out unit is taking the top 4 bits from each color. The output of the stream to video out is 8 bit per color, and we we needed to fit it to VGA which is 4 bit per color. 6 7 agenda Uart Solution Matlab – we sent the same data in a loop until it is read by the board, instead of sending it once. Uart Problem Colors of the picture are completely mixed 1 The uart buffer workes as a FIFO, so we saved the data in the memory in the reverse way, so it was saved correctly. The shape was almost correct but the colors were different then the original picture 2 we found a function, that cleans the FIFO of the uart at the beginning of the read operation. The function is setOptions(uart,reset) The picture on the screen starts with an offset. 3 We created a simple handshake protocol in which the matlab sends the data to the uart in a 65 byte blocks (uart’s buffer size)and waits for the acknowledge from the board. The transportation of the data was stack every time after a different amount of transfers. 4 We continued sending the next 65 bytes instead of resending the previous ones. Every some amount of transfers one transfer is unsuccessful.When that happens the picture on the screen has a shift and the colors are changed. The problem is that if the communication problem happens due to acknowledge lost, which means that the data was receive only the matlab think it didn’t because no ack was detected, we are resending data that was succesfuly received by the board 5 Create a protocol that knows to differentiate the reason for the error. The same thing happens when the communication is unsuccesful due to transmited data lost. In that case we should! Resend the data. 6 APPENDIX SOFTWARE MATLAB The main goal of the program is to create Matlab GUI interface between the PC and Zedboard in order to load the desired image. Step One: Determine the desired uart port configuration. (8 data bit, 1 stop bit, 115200 baud rate) Step two: Load the bitmap image into Matlab and make dimensions’ rescale: 640 * 480 or 1080 * 1920 Step Three: Open the port and send information Step Four: For Testing Purpose Only- read back the Image and display on screen software flowchart C CODE – SDK LOAD PICTURE INTO DDR Program should read the incoming data and load it into the DDR. At this point of the Project we encounter a technical problem – Zedboard buffer size at polling mode is 65 bytes only, so there is no option at this time to load the whole picture. In order to continue with the development, the pictures were written manually into the DDR. software flowchart Problem – “Works” Only in Remote mode. Matlab is sending and receiving by itself. No data is written into Zed board Buffer. This can indicate that Matlab transportation data is fine, but zedboard reading is not. software flow Other uart modes (as Local Loop for exp) get only first 65 Bytes. C CODE – SDK VDMA CONFIGURATION AND SETUP Initialize DMA engine – A VDMA instance is set to VDMA Physical address Setup the Read channel- The VDMA module use only Read Channel (mm2s). Setup of vertical and horizontal lengths, frames store start address, and other unused Registers. software flowchart C CODE – SDK VDMA START TRANSFER Start the DMA engine to transfer – the VDMA read channel is activated. parking on a frame –The vdma reads the same image, in order to display image on screen Continuously. The VDMA jumps between two images using a counter. It performs temporary parking for each image. software flowchart INCONSISTENT WRITING PROBLEM SOLUTION – DISABLE CACHES Critical issues and solutions COMPLETE PROGRAM (WITHOUT UART) VDMA sub system Block Diagram ZYNQ SUB SYSTEM Block Diagram