PROJECT – VIDEO MANIPULATOR (BASED ON ZED BOARD) FINAL PRESENTATION Yakir Peretz Idan Homri Supervisor - Rolf Hilgendorf Semester - winter 2014 Duration - one semester AGENDA 1. Project goals 2. Component description 3. Data flow 4. Required tests and check points 5. Clock definitions 6. Software description 7. Critical issues and solutions 8. Complete Program (without Uart) PROJECT GOALS Creating a system that enables reading images from an external device, saving it in the memory and displaying it by VGA. Creating a programmable logic design that will handle the transportation of the data from the main memory to the VGA output via video direct mapped accessed (VDMA) component. agenda COMPONENT DESCRIPTION -ZYNQ In the design we use the following components: ZYNQ processor the ZYNQ is actually the PS (processing system) part of the design, which means all the software programmable part. This part is very powerful and includes many features, but we use the following: UART connection memory controller the memory itself - DDR3 One ARM processor –CORTEX A9 All the needed interface connections to the other components in the PL side are built in. All the clocks of the design are generated by the zynq, and given to the relevant components. PS side overview COMPONENT DESCRIPTION -VDMA VDMA The VDMA is the core of the PL side of the design. It is responsible for the transportation of the data from the memory to the stream part. It is connected to three other components via three buses: 1. To the processor via AXI4-lite – to get data regarding the address and size of the data to get from the memory. 2. To the memory controller via full AXI4 – to get the data from. 3. To the “stream_to_video_out” via AXI4-stream – to send the data to. the data transportation to the stream part is done with respect to the VTC timing signals. VDMA COMPONENT DESCRIPTION - VTC Video timing controller this component is responsible for timing the data transfer from the VDMA to the stream to video out component. It generated signals regarding the vertical data transfer (line count) and the horizontal data transfer (pixels per line) as well as the active video signal. It works with a clock that is set in order to fit the data size and rate of pictures per second - Clock definitions Video timing controller COMPONENT DESCRIPTION - STREAM Stream to video out – AXI4-Stream to Video Out core converts AXI4-Stream Video protocol from Xilinx video processing cores such as VDMA, that use this protocol, to video output with explicit sync and timing such as the unit we built to communicate with the VGA port. In our project, the unit is used to convert the output of the VDMA in AXI_stream protocol to an actual video protocol that consists of: 1. Active data signal 2. Vertical sync and horizontal sync 3. Blank periods “Stream to video out” interface COMPONENT DESCRIPTION – RGB_OUT RGB_out this unit was built by us, to convert the data from 8 bit per color (for red green and blue) to 4 bit per color. The output of this unit is the input of the VGA ports RGB – 4 bits per color Vsync Hsync. agenda RGB_out DATA FLOW Step 1 : Sending the data from an external device to the uart (bitmap to pixels only) PS side overview Step 6: the data is transferred from the stream to a RGB_output component to be transferred in the right form RGB_out agenda Step 2 : Extracting the data from the Uart and saving it to the memory Step 5: The VDMA sends the data to the “stream to video out” unit, with respect to the VTC timing. “Stream to video out” interface Video timing controller Step 3: the ZYNQ processor triggers the VDMA by sending the start address and the size of the data in the memory, on an AXI_LITE. Block Diagram Step 4: The data is being transferred to the VDMA via memory controller and saved in a frame buffer. VDMA REQUIRED TESTS AND CHECK POINTS We have some strategic check points for validating our design: Uart to memory – we first check that the data we delivered from an external device thru the Uart is saved in the memory where we wanted it to be saved. Memory to VDMA – we check that the data is transferred correctly from the memory to the frame bufers inside the VDMA. VDMA to “stream_to_video_out” – we check that the data is transferred correctly from the VDMA to the stream to video out by reading the data runs on the AXI_stream bus. Control signals – we need to check that the “video_timing_controller” is sending the timing signals as we assumed it will. “stream_to_video_out” to VGA – we check if the data from the “stream_to_video_out” is sent as we wanted in a 24 bit (8 bit per color and 3 colors R,G,B) format. VGA output – we need to check that the data in the output of the VGA component is the picture we delivered. This should be displayed on the screen. Block Diagram agenda CLOCK DEFINITIONS There are 2 main clocks in the design (beside the ARM clock & DDR clock) The faster clock is used for the AXI4_lite bus that connects the ARM and the VDMA. On that bus the ARM transfers the data regarding the address and the size of the picture in the memory. the clock is set to 200MHZ The slower clock is used for the full AXI4 bus and the AXI_stream bus. On that bus we move the data from the memory to the VDMA and then from the VDMA to the “stream to video out” unit. That clock is defined to be 148.5 mega pixels per second. That is calculated to fit the amount of data being transferred in one second, calculated as: (number of lines including blank)*(number of pixels per line including blank)*(number of pictures per second) – for us - 2200*1125*60 = 148.5[MHz] In order to fit to the screen in the lab we needed a 1080*1920, and there are 60 pictures per second. (the sizes represent pixels). agenda Component description - VTC BLOCK DIAGRAM VDMA sub system ZYNQ sub system Data flow Required tests and check points PS SIDE OVERVIEW To VDMA via Axi lite To VDMA via AXI-4 Data flow Component description -ZYNQ Pin to Pin “STREAM TO VIDEO OUT” INTERFACE From VDMA To VGA out From video timing controller Data flow Component description - stream VIDEO TIMING CONTROLLER For write channel – not in use Optionally – can be controlled by the processor. Not in use Data flow Output timing signals for the stream_to_video_out unit Component description - VTC RGB_OUT Input Sync signals Data – 4 bits per color Output Sync signals Data – 8 bit per color Data flow Component description – RGB_out VDMA Connected to the memory on a full AXI4 bus. Required for data transfer This is the connection to the processor to get the address and the size of the picture from Those are the 3 clocks of the desine Data flow Component description -VDMA Connected to the memory on an AXI4_strea m bus. Required for data transfer SOFTWARE DESCRIPTION Software Flowchart Host - MATLAB Rescaling of BMP Image, open Uart for writing and sending the Image Matlab Vdma Configuration and Setup Vdma Configuration and Setup Vdma Start Transfer Vdma Start Transfer agenda Load the incoming Image from Uart into DDR HAND SHAKE PROTOCOL Board C code SDK CRITICAL ISSUES AND SOLUTIONS Solution Problem Uart Buffer is limited and Uninitialized 1 Hardware – Connect reset to active low, and clocken to const 1. Software – Parking on frames with appropriated Xilinx function. Stream To Video Out – Always Output 0. 2 Hardware – Configure Peripheral clock to 2200*1125*60 = 148.5[MHz] SAMSUNG Screen requires format picture of -1080*1920 60[Hz] 3 Software – disable caches Image with “Noise” - Inconsistent writing problem 4 SDK loads drivers only for components that are connected directly to ZYNQ with AXI-lite. The driver (C code) can modify the registers during running via the AXI-lite interface (like the VDMA), but driver is not always necessary. In some cases you need to define settings only during invocation of the hardware in Vivado (like the VTC). Since there is no need for connection with the CPU, because the VTC behavior is fixed it was not connected to the AXI_LITE, and no drive is needed. Video Timing Controller driver isn’t loaded into SDK. 5 The RGB_out unit is taking the top 4 bits from each color. The output of the "stream to video out" is 8 bit per color, while VGA is 4 bit per color. 6 uart issues agenda Uart Solution Matlab – the same data was sent in a loop until it was read by the board, instead of sending it once and waiting for ACK. Uart Problem Colors of the picture are completely mixed 1 The UART buffer works as a FIFO, therefore the data is saved in the memory in reverse way. The shape was almost correct but the colors were different than the original picture 2 Cleaning the FIFO of the UART at the beginning of the read operation. The function is set Options(UART, reset) The picture on the screen starts with an offset. 3 A simple handshake protocol was created in which the matlab sends the data to the UART in a 64 byte blocks (UART’s buffer size) and waits for the acknowledge from the board. The transportation of the data was stack every time after a different amount of transfers. 4 It seems like the timeout of the SDK to resend acknowledge was equal to the timeout of the matlab before sending the image again. That way each time acknowledge is lost, the data is being sent twice. The timeout of the matlab was changed to be 10 times bigger than SDK and problem was solved. Every some amount of transfers - one transfer is unsuccessful. When it happens, the picture on the screen has a shift and the colors are changed. The problem is that if the communication problem happens due to acknowledge lost, which means that the data was receive but the matlab think it didn’t because no ack was detected, data is then resent even though it was successfully received by the board ACKNOWLEDGE LOST (SHIFT RIGHT) 5 Create a protocol that knows to differentiate the reason for the error. In that case it should resend the data. The same thing happens when the communication is unsuccessful due to transmitted data lost. DATA LOST (SHIFT LEFT) 6 FINAL RESULTS Image initialization final presentation SUMMERY AND CONCLUTION The project goals were to load data from an external device and save it to the internal memory, and create a data path in which the data is transferred from the memory, and being eventually streamed aout to the VGA port in the right format. The first goal, even though seems to be less complicated required more "mind work" in the sense of going into small details when creating a communication protocol. A lot of thought should be taken when setting the rate of the timing controller, the timeouts of the two parts of the communication and more. … The second goal of this project required a great amount of learning. Starting with the board being used, the vivado tool and the SDK tool. After that all the IPs that were mentioned here had to be completely understood in order to be configured correctly and connected correctly to the other components. Connecting a system that contains more than a few parts is a very gentle task, and should be done with a great deal of attention to small details. Creating such a system can be a good preparation before doing any project even outside the academy. … There are a few conclusions for this project: When getting a fixed component, one should fully understand its use before getting to work with it. If one is "stuck" in some part of the project, it is best to overcome the problem in some way possible and move on with the work while thinking how to solve it. That way the work is in progress most of the time. One should always use all the help he can get. Most of the time when not being able to proceed because of technical issues, the solution is found with some technical person or the supervisor. APPENDIX SOFTWARE MATLAB The main goal of the program is to create Matlab GUI interface between the PC and Zedboard in order to load the desired image. Step One: Determine the desired uart port configuration. (8 data bit, 1 stop bit, 115200 baud rate) Step two: Load the bitmap image into Matlab and make dimensions’ rescale: 1080 * 1920 Step Three: Open the port and send information (Hand shake protocol) software flowchart C CODE – SDK LOAD PICTURE INTO DDR Program should read the incoming data and load it into the DDR. At this point of the Project we encounter a technical problem – Zedboard buffer size at polling mode is 65 bytes only, so there is no option at this time to load the whole picture. In order to continue with the development, the pictures were written manually into the DDR. software flowchart C CODE – SDK VDMA CONFIGURATION AND SETUP Initialize DMA engine – A VDMA instance is set to VDMA Physical address Setup the Read channel- The VDMA module use only Read Channel (mm2s). Setup of vertical and horizontal lengths, frames store start address, and other unused Registers. software flowchart C CODE – SDK VDMA START TRANSFER Start the DMA engine to transfer – the VDMA read channel is activated. parking on a frame –The vdma reads the same image, in order to display image on screen Continuously. software flowchart HAND SHAKE PROTOCOL 20 שקופית INCONSISTENT WRITING PROBLEM SOLUTION – DISABLE CAHCHES Critical issues and solutions ACKNOWLEDGE LOST (SHIFT RIGHT) 22 שקופית DATA LOST (SHIFT LEFT) 22 שקופית VDMA sub system Block Diagram ZYNQ SUB SYSTEM Block Diagram