Final Presentation of part A (Annual project) Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Project Recap Data Flow Blocks implementation Conclusions Project B - Time Table The algorithm: Nonlinear Diffusion use numeric solution with iterations to solve the diffusion equation Why use it for image processing? Image noise is smoothed Edges remain sharp Original image Look at the hat (smoothed) dt = 30 !!! one iteration Look at the edges (sharp!) Difficulties with the semi-implicit model: Very complex design (Thomas), makes real time almost impossible Transpose entire image Reverse order loop multiple memory accesses So why use this model ??? Strong effect - good results after very few iterations Columns T’ DVI IN PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ T’ How to implement T’ In real time??? Lines PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT Transpose DDRII T’ WRITE M-RAM WRITE DDRII T’ READ PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 Freq controller: 4F to F DDRII T’ WRITE M4K LINE REVERSE READ columns M-RAM READ Freq controller: DVI IN M4K LINE REVERSE WRITE F to 4F Double buffers External Balancedmemory channels DDRII T’ READ DDRII T’ WRITE DDRII T’ READ Reduced frequency Transpose DDRII T’ WRITE DDRII T’ READ rows PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE M-RAM READ DVI OUT AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization Addressing controller • Addressing method - First attempt: Use cache organization approach: 15 bits 1bit 4bits 10bits Area row column • Fast - direct access to data in memory • Easy to implement - no logic is needed for “translation” However, expensive : • 10 bits is more than we need for column representation Addressing controller • 1st attempt implementation requires: 98KB • 1 M-RAM block is 64KB Solution Quartus report • Use consecutive addressing • Address = block + row + phase • Requires “translation” … but: Size: 61KB - Fits! Addressing controller Address translation units AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization Transpose DDRII T’ WRITE M-RAM WRITE DDRII T’ READ PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM READ columns Freq controller: Freq controller: F to 4F 4F to F DVI IN DDRII T’ WRITE DDRII T’ WRITE DDRII T’ READ Transpose lines PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 DDRII T’ READ DDRII T’ WRITE M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE DDRII T’ READ M-RAM READ DVI OUT Transpose Goal: • write the transposed data , so it can later be read sequentially, in rows Problem: • Random access in DDR is too expensive: 32 clk penalty! solution: • Use internal memory to inverse order: - “pay” most penalty in random accesses to FPGA mem • Write to DDR in “windows” : - Enable sequential row write - Penalty only every row skip Transpose how it works: M-RAM WRITE M-RAM READ DDRII T’ WRITE DDRII T’ READ Sequential read from DDR Penalty all the time ! Penalty every row skip AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization Transpose DDRII T’ WRITE M-RAM WRITE DDRII T’ READ PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM READ columns Freq controller: Freq controller: F to 4F 4F to F DVI IN DDRII T’ WRITE DDRII T’ WRITE DDRII T’ READ Transpose lines PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 DDRII T’ READ DDRII T’ WRITE M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE DDRII T’ READ M-RAM READ DVI OUT Reverse Line Order • Used for Thomas algorithm Implementation • On M4K blocks • Double sized buffer with alternating pointers for Read/Write Write Read Write Read 0 640 Swap addresses AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization Transpose DDRII T’ WRITE M-RAM WRITE DDRII T’ READ PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM READ columns Freq controller: Freq controller: F to 4F 4F to F DVI IN DDRII T’ WRITE DDRII T’ WRITE DDRII T’ READ Transpose lines PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 DDRII T’ READ DDRII T’ WRITE M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE DDRII T’ READ M-RAM READ DVI OUT We need very large double buffers, that can be integrated easily with FPGA designs FPGA is resource limited Solution: use external memory for this purpose. Enables efficient usage of the memory on GiDEL PROC board Up to 16 ports per bank, 2 banks per FPGA Each port may be forced to access a different memory area and limited to a certain address space Straightforward random memory access with random ports – slow and not efficient Segmented working mode option for sequential ports. Enables to perform fast read/write bursts. Two ports: sequential read and write. Each accesses a different memory area. Implement double buffer: by switching the starting address at the end of every burst. Multi port core Our Entity Control signals with Controller Write sequential port Fixed CLK PROBLEM Pipeline Design Read sequential port External DVI CLK Add FIFO to implement data rate matching. Altera provides dual-clock FIFO (DCFIFO) megafunction. Using it before and after each write/read port would solve the problem. Control logic is integrated into the control entity. Extra FIFOs = extra FPGA resources Solution Multi port core Our Entity Control signals with Controller Write sequential port Pipeline Design Read sequential port Multi clk DVI clk Multi clk DVI clk Buffer controller Schema Reset Prepare including for Following DDR protocol wait states read \ write Flush Next slide… • Symmetric Read \ read write \ write bursts according to FIFOs states • Burst length can be adjusted Problem: Data is written to DDR, only when the internal DDR FIFO is full Solution: Flush forces the FIFO to pass data. Not using the Accurate flush length results in image noise! Problem: Flush delay length is not constant and depends on burst length Solution: stretch write bursts until FIFO is almost full. This will lower flush influence. Fixed controller Schema Reset Prepare for read \ write Flush Read \ write Internal fifo is almost full Up to 8 buffers per memory bank Must comply with bandwidth restrictions (MultiPort utilization) % Bandwidth per buffer 100 10 0 2 4 6 buffers Integration effort 8 10 AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization Transpose DDRII T’ WRITE M-RAM WRITE DDRII T’ READ PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM READ columns Freq controller: Freq controller: F to 4F 4F to F DVI IN DDRII T’ WRITE DDRII T’ WRITE DDRII T’ READ Transpose lines PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 DDRII T’ READ DDRII T’ WRITE M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE DDRII T’ READ M-RAM READ DVI OUT In original design – down rate used internal memory. However, needed FIFO will not fit on FPGA Implementation is based on the DDR buffer with asymmetric read / write Extra DDR access Input output DCFIFOs are asymmetric in size Down rate buffer save to DDR only 1 frame out of 4 Full data path Up rate buffer read from DDR same frame 4 times Re/Wr Sync controller reset reset reset Prepare for write Prepare for read Prepare for write Flush Flush Write Read/write Flush Read AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization •The signals must Pass through the same long delays as data • extra bits written to memory 24 data bit Mux Data path with memory access 24bit to 12bit double rate 12 bits Flag frame DVI rx gen hsync hsync vsync DVI in controller Flag detector Signal generation date enable clk FPGA PLL gen vsync gen de clk DVI tx Send a known flag through the data path Start generating according to flag arrival 24 data bit Mux Data path with memory access 24bit to 12bit double rate 12 bits Flag frame DVI rx gen hsync hsync vsync DVI in controller Flag detector Signal generation date enable clk FPGA PLL gen vsync gen de clk DVI tx Delay M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE READ columns DDRII T’ READ Freq controller: Freq controller: 4F to F 4F to F DVI IN M4K LINE REVERSE WRITE M-RAM WRITE M-RAM READ M-RAM WRITE M-RAM READ DDRII T’ WRITE DDRII T’ READ Transpose DDRII T’ WRITE 48bit PIPE DDRII T’ READ lines M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE M-RAM READ DVI OUT Transpose DDRII T’ WRITE M-RAM WRITE DDRII T’ READ Delay M-RAM WRITE M-RAM READ PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 Freq controller: F to 4F 4F to F DDRII T’ WRITE M-RAM WRITE M4K LINE REVERSE READ columns M-RAM READ Freq controller: DVI IN M4K LINE REVERSE WRITE DDRII T’ WRITE DDRII T’ READ DDRII T’ READ Transpose M-RAM READ DDRII T’ WRITE DDRII T’ READ lines 48bit PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE M-RAM READ DVI OUT Transpose DDRII T’ WRITE DDRII T’ READ PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ columns M-RAM WRITE M-RAM READ Freq controller: Freq controller: F to 4F 4F to F DVI IN DDRII T’ WRITE DDRII T’ WRITE DDRII T’ READ DDRII T’ READ Transpose DDRII T’ WRITE DDRII T’ READ lines PIPE M4K LINE REVERSE WRITE M4K LINE REVERSE READ Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M-RAM WRITE M-RAM READ DVI OUT Summery Internal memory blocks: Addressing controller Transpose Line reverse External Double memory: buffer on DDR Up/down DVI rate controller synchronization Problem with the board’s RESET Problem with loading design Plan and implement logic blocks: • SQRT, DIV are the main problem • Verify required precision (based on our conclusions from part A) Integration of frequency controllers and transpose blocks Implement one full iteration Divide between 2 problems: Design of logic blocks Full DDR blocks integration How? Implement the processing algorithm for a smaller frame - Avoid using external memory DVI IN M-RAM WRITE M-RAM READ Sample smaller frame Logic blocks M-RAM WRITE M-RAM READ DVI OUT Tests Part A Documentation tests period Image Processing Algorithm Remaining Transpose integration Reverse order integration Tests Multy-channel timing calibration Part B Documentation Project B goal: create end to end data path with Image Processing