Presentation Part A

advertisement
Final Presentation of part A
(Annual project)
Roman Kofman & Sergey Kleyman
Neta Peled & Hillel Mendelson
Supervisor: Mike Sumszyk

Project Recap

Data Flow

Blocks implementation

Conclusions

Project B - Time Table

The algorithm: Nonlinear Diffusion
use numeric solution with iterations to solve
the diffusion equation

Why use it for image processing?
Image noise is smoothed
Edges remain sharp
Original image
Look at the hat
(smoothed)
dt = 30 !!!
one iteration
Look at the edges
(sharp!)

Difficulties with the semi-implicit
model:
 Very complex design (Thomas), makes
real time almost impossible




Transpose entire image
Reverse order loop
multiple memory accesses
So why use this model ???
 Strong effect - good results after very few
iterations
Columns
T’
DVI
IN
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
T’
How to implement T’ In real time???
Lines
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
DVI
OUT
Transpose
DDRII
T’
WRITE
M-RAM
WRITE
DDRII
T’
READ
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
Freq controller:
4F to F
DDRII
T’
WRITE
M4K LINE
REVERSE
READ
columns
M-RAM
READ
Freq controller:
DVI
IN
M4K LINE
REVERSE
WRITE
F to 4F
Double buffers
External
Balancedmemory
channels
DDRII
T’
READ
DDRII
T’
WRITE
DDRII
T’
READ
Reduced frequency
Transpose
DDRII
T’
WRITE
DDRII
T’
READ
rows
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
M-RAM
READ
DVI
OUT
AGENDA



Internal memory blocks:

Addressing controller

Transpose

Line reverse
External memory:

Double buffer on DDR

Up/down rate controller
DVI synchronization
Addressing controller
• Addressing method - First attempt:
Use cache organization approach:
15 bits
1bit
4bits
10bits
Area
row
column
• Fast - direct access to data in memory
• Easy to implement - no logic is needed for “translation”
However, expensive :
• 10 bits is more than we need for column representation
Addressing controller
• 1st attempt implementation requires: 98KB
• 1 M-RAM block is 64KB
Solution
Quartus report
• Use consecutive addressing
• Address = block + row + phase
• Requires “translation” … but:
Size: 61KB - Fits!
Addressing controller
Address translation
units
AGENDA



Internal memory blocks:

Addressing controller

Transpose

Line reverse
External memory:

Double buffer on DDR

Up/down rate controller
DVI synchronization
Transpose
DDRII
T’
WRITE
M-RAM
WRITE
DDRII
T’
READ
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
READ
columns
Freq controller:
Freq controller:
F to 4F
4F to F
DVI
IN
DDRII
T’
WRITE
DDRII
T’
WRITE
DDRII
T’
READ
Transpose
lines
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
DDRII
T’
READ
DDRII
T’
WRITE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
DDRII
T’
READ
M-RAM
READ
DVI
OUT
Transpose
Goal:
• write the transposed data , so it can later be read
sequentially, in rows
Problem:
• Random access in DDR is too expensive: 32 clk penalty!
solution:
• Use internal memory to inverse order:
- “pay” most penalty in random accesses to FPGA mem
• Write to DDR in “windows” :
- Enable sequential row write
- Penalty only every row skip
Transpose
how it works:
M-RAM
WRITE
M-RAM
READ
DDRII
T’
WRITE
DDRII
T’
READ
Sequential
read from DDR
Penalty all the
time !
Penalty every
row skip
AGENDA



Internal memory blocks:

Addressing controller

Transpose

Line reverse
External memory:

Double buffer on DDR

Up/down rate controller
DVI synchronization
Transpose
DDRII
T’
WRITE
M-RAM
WRITE
DDRII
T’
READ
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
READ
columns
Freq controller:
Freq controller:
F to 4F
4F to F
DVI
IN
DDRII
T’
WRITE
DDRII
T’
WRITE
DDRII
T’
READ
Transpose
lines
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
DDRII
T’
READ
DDRII
T’
WRITE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
DDRII
T’
READ
M-RAM
READ
DVI
OUT
Reverse Line Order
• Used for Thomas algorithm
Implementation
• On M4K blocks
• Double sized buffer with alternating pointers for
Read/Write

Write
Read
Write
Read
0
640
Swap addresses
AGENDA



Internal memory blocks:

Addressing controller

Transpose

Line reverse
External memory:

Double buffer on DDR

Up/down rate controller
DVI synchronization
Transpose
DDRII
T’
WRITE
M-RAM
WRITE
DDRII
T’
READ
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
READ
columns
Freq controller:
Freq controller:
F to 4F
4F to F
DVI
IN
DDRII
T’
WRITE
DDRII
T’
WRITE
DDRII
T’
READ
Transpose
lines
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
DDRII
T’
READ
DDRII
T’
WRITE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
DDRII
T’
READ
M-RAM
READ
DVI
OUT



We need very large double buffers, that
can be integrated easily with FPGA designs
FPGA is resource limited
Solution: use external memory for this
purpose.





Enables efficient usage of the memory on GiDEL
PROC board
Up to 16 ports per bank, 2 banks per FPGA
Each port may be forced to access a different
memory area and limited to a certain address space
Straightforward random memory access with
random ports – slow and not efficient
Segmented working mode option for sequential
ports. Enables to perform fast read/write bursts.


Two ports: sequential read and write.
Each accesses a different memory area.
Implement double buffer: by switching the
starting address at the end of every burst.
Multi port
core
Our
Entity
Control signals
with
Controller
Write sequential port
Fixed CLK
PROBLEM
Pipeline Design
Read sequential port
External
DVI CLK




Add FIFO to implement data rate matching.
Altera provides dual-clock FIFO (DCFIFO)
megafunction. Using it before and after each
write/read port would solve the problem.
Control logic is integrated into the control
entity.
Extra FIFOs = extra FPGA resources
Solution
Multi port
core
Our
Entity
Control signals
with
Controller
Write sequential port
Pipeline Design
Read sequential port
Multi clk
DVI clk
Multi clk
DVI clk
Buffer controller
Schema
Reset
Prepare including
for
Following DDR protocol
wait states
read \ write
Flush
Next slide…
• Symmetric
Read \ read
write \ write bursts
according to FIFOs states
• Burst length can be adjusted
Problem: Data is written to DDR, only
when the internal DDR FIFO is full
Solution: Flush forces the FIFO to pass
data. Not using the Accurate flush
length results in image noise!
 Problem: Flush delay length is not
constant and depends on burst length
 Solution: stretch write bursts until
FIFO is almost full. This will lower
flush influence.

Fixed controller
Schema
Reset
Prepare for
read \ write
Flush
Read \ write
Internal fifo is almost full


Up to 8 buffers per memory bank
Must comply with bandwidth restrictions
(MultiPort utilization)
%
Bandwidth per buffer
100
10
0
2
4
6
buffers

Integration effort
8
10
AGENDA



Internal memory blocks:

Addressing controller

Transpose

Line reverse
External memory:

Double buffer on DDR

Up/down rate controller
DVI synchronization
Transpose
DDRII
T’
WRITE
M-RAM
WRITE
DDRII
T’
READ
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
READ
columns
Freq controller:
Freq controller:
F to 4F
4F to F
DVI
IN
DDRII
T’
WRITE
DDRII
T’
WRITE
DDRII
T’
READ
Transpose
lines
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
DDRII
T’
READ
DDRII
T’
WRITE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
DDRII
T’
READ
M-RAM
READ
DVI
OUT




In original design – down rate used internal
memory. However, needed FIFO will not fit on
FPGA
Implementation is based on the DDR buffer with
asymmetric read / write
Extra DDR access
Input output DCFIFOs are asymmetric in size
Down rate
buffer
save to DDR only
1 frame out of 4
Full
data path
Up rate
buffer
read from DDR
same frame
4 times
Re/Wr Sync
controller
reset
reset
reset
Prepare for
write
Prepare for
read
Prepare for
write
Flush
Flush
Write
Read/write
Flush
Read
AGENDA



Internal memory blocks:

Addressing controller

Transpose

Line reverse
External memory:

Double buffer on DDR

Up/down rate controller
DVI synchronization
•The signals must Pass through the same long
delays as data
• extra bits written to memory
24 data bit
Mux
Data path
with
memory
access
24bit to
12bit
double rate
12 bits
Flag frame
DVI
rx
gen
hsync
hsync
vsync
DVI in
controller
Flag
detector
Signal
generation
date
enable
clk
FPGA
PLL
gen
vsync
gen
de
clk
DVI
tx


Send a known flag through the data path
Start generating according to flag arrival
24 data bit
Mux
Data path
with
memory
access
24bit to
12bit
double rate
12 bits
Flag frame
DVI
rx
gen
hsync
hsync
vsync
DVI in
controller
Flag
detector
Signal
generation
date
enable
clk
FPGA
PLL
gen
vsync
gen
de
clk
DVI
tx
Delay
M-RAM
WRITE
M-RAM
READ
Transpose
DDRII
T’
WRITE
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
READ
columns
DDRII
T’
READ
Freq controller:
Freq controller:
4F to F
4F to F
DVI
IN
M4K LINE
REVERSE
WRITE
M-RAM
WRITE
M-RAM
READ
M-RAM
WRITE
M-RAM
READ
DDRII
T’
WRITE
DDRII
T’
READ
Transpose
DDRII
T’
WRITE
48bit
PIPE
DDRII
T’
READ
lines
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
M-RAM
READ
DVI
OUT
Transpose
DDRII
T’
WRITE
M-RAM
WRITE
DDRII
T’
READ
Delay
M-RAM
WRITE
M-RAM
READ
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
Freq controller:
F to 4F
4F to F
DDRII
T’
WRITE
M-RAM
WRITE
M4K LINE
REVERSE
READ
columns
M-RAM
READ
Freq controller:
DVI
IN
M4K LINE
REVERSE
WRITE
DDRII
T’
WRITE
DDRII
T’
READ
DDRII
T’
READ
Transpose
M-RAM
READ
DDRII
T’
WRITE
DDRII
T’
READ
lines
48bit
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
M-RAM
READ
DVI
OUT
Transpose
DDRII
T’
WRITE
DDRII
T’
READ
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
columns
M-RAM
WRITE
M-RAM
READ
Freq controller:
Freq controller:
F to 4F
4F to F
DVI
IN
DDRII
T’
WRITE
DDRII
T’
WRITE
DDRII
T’
READ
DDRII
T’
READ
Transpose
DDRII
T’
WRITE
DDRII
T’
READ
lines
PIPE
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
Thomas 3
M4K LINE
REVERSE
WRITE
M4K LINE
REVERSE
READ
M-RAM
WRITE
M-RAM
READ
DVI
OUT
Summery
 Internal
memory blocks:
 Addressing
controller
 Transpose
 Line
reverse
 External
 Double
memory:
buffer on DDR
 Up/down
 DVI
rate controller
synchronization

Problem with the board’s RESET

Problem with loading design

Plan and implement logic blocks:
• SQRT, DIV are the main problem
• Verify required precision
(based on our conclusions from part A)


Integration of frequency controllers and
transpose blocks
Implement one full iteration
Divide between 2 problems:
 Design of logic blocks
 Full DDR blocks integration
How?
 Implement the processing algorithm
for a smaller frame - Avoid using
external memory
DVI
IN
M-RAM
WRITE
M-RAM
READ
Sample smaller
frame
Logic blocks
M-RAM
WRITE
M-RAM
READ
DVI
OUT
Tests
Part A Documentation
tests period
Image Processing Algorithm
Remaining
Transpose integration
Reverse order integration
Tests
Multy-channel timing calibration
Part B Documentation
Project B goal: create end to end data path with Image Processing
Download