Distributed Canny Edge
Detection Algorithm and FPGA
Implementation
Mini Project Report
Submitted by
Parth Mishra
Roll Number: 23ECB0F17
Syed Ahmed
Roll Number: 23ECB0F14
Submitted as a Mini Project for
FPGA Lab
Under the guidance of
Prof. P. Prithvi
Associate professor
Prof. V. Narendar
Assistant Professor Gr-1
Department of Electronics and Communication
Engineering
National Institute of Technology Warangal
March, 2025
NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL, INDIA
DEPARTMENT OF ELECTRONICS AND COMMUNICATION
ENGINEERING
Certificate
They have successfully completed Field Programmable Gate Array Lab. It
is hereby certified that report is comprehensive and fit for evaluation. This
is to certify that the B.tech. 2nd year (2nd Semester) Mini Project Report
on “Distributed Canny Edge Detector: Algorithm and FPGA
Implementation” using VERILOG submitted by Parth
Mishra(23ECB0F17) , Syed Ahmed(23ECB0F14). in the partial fulfillment
of the requirement for the award of B.Tech degree in Electronics and
Communications Engineering.
They have successfully completed Field Programmable Gate Array Lab. It
is hereby certified that report is comprehensive and fit for evaluation.
Dr. P. Prithvi
Dr. V. Narendra
Assistant professor
Assistant Professor Gr-1
NIT WARANGAL
NIT WARANGAL
FPGA-Based Canny Edge Detection
Abstract
Edge detection is a crucial technique in image processing, widely used
in applications such as object recognition and feature extraction. In this
project, we implement a hardware-accelerated Canny Edge Detection system on an FPGA to achieve real-time image processing with high accuracy and efficiency. Unlike software-based solutions, FPGA-based implementation enables faster computation and parallel processing, making it
ideal for embedded vision systems. Our design utilizes Verilog HDL to
implement various stages of the Canny Edge Detection algorithm, including Gaussian filtering, gradient computation using the CORDIC algorithm,
non-maximum suppression, and double thresholding. The system processes
grayscale images stored in memory and outputs a binary edge-detected image. A pipeline-based architecture is used to optimize performance, and
block RAM (BRAM) is employed for efficient data storage and retrieval. To
verify the design, simulations are conducted in Vivado, and test images are
processed through a MATLAB-based visualization framework. The output
of the FPGA implementation is compared with a C++ software-based edge
detection to validate accuracy and performance. The results demonstrate
that the FPGA-based approach achieves low latency and efficient resource
utilization, making it suitable for real-time applications such as robotics,
surveillance, and medical imaging. This project highlights the benefits of
hardware acceleration for computationally intensive image processing tasks
and serves as a foundation for further research in high-performance embedded vision systems.
1
Contents
1 Introduction
7
2 Literature Review
9
3 Problem Statement
11
4 Novelty
12
5 Implementation Steps
13
6 Objectives
14
6.1 Gaussian Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2 Gradient Computation . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Non-Maximum Suppression . . . . . . . . . . . . . . . . . . . . . . 15
6.4 Double Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.5 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . 15
7 Theory
16
7.1 Gaussian Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
7.1.1 Concept of Gaussian Filtering . . . . . . . . . . . . . . . . . 16
7.1.2 Purpose of Gaussian Filtering . . . . . . . . . . . . . . . . . 16
7.1.3 Implementation of Gaussian Filtering . . . . . . . . . . . . . 17
7.1.4 Advantages of Gaussian Filtering . . . . . . . . . . . . . . . 17
7.1.5 Gaussian Filtering in FPGA-Based Processing . . . . . . . . 17
7.2 Gradient Computation . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.2.1 Concept of Image Gradient . . . . . . . . . . . . . . . . . . 17
7.2.2 Sobel Operator for Gradient Computation . . . . . . . . . . 18
7.2.3 Gradient Magnitude and Direction . . . . . . . . . . . . . . 18
7.2.4 FPGA Implementation of Gradient Computation . . . . . . 18
7.3 Non-Maximum Suppression . . . . . . . . . . . . . . . . . . . . . . 19
7.3.1 Working Principle of Non-Maximum Suppression . . . . . . 19
7.3.2 Mathematical Representation . . . . . . . . . . . . . . . . . 20
7.3.3 FPGA Implementation of Non-Maximum Suppression . . . . 20
7.4 Double Thresholding and Edge Tracking . . . . . . . . . . . . . . . 21
7.4.1 Double Thresholding . . . . . . . . . . . . . . . . . . . . . . 21
7.4.2 Edge Tracking by Hysteresis . . . . . . . . . . . . . . . . . . 21
7.4.3 FPGA Implementation of Double Thresholding and Edge
Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.5 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . 22
8 Results and Conclusions
23
9 Future Work
25
2
FPGA-Based Canny Edge Detection
3
A Appendix
26
B References
61
FPGA-Based Canny Edge Detection
List of Abbreviations
FPGA
HDL
BRAM
VCD
Field Programmable Gate Array
Hardware Description Language
Block RAM
Value Change Dump
4
FPGA-Based Canny Edge Detection
List of Figures
1.
2.
3.
4.
4.
5.
Workflow of the project
Block diagram for edge detection
Output simulation
Normalized gradient magnitude CDFs
Gradient magnitude and its probability
Strong edges percentage value
5
FPGA-Based Canny Edge Detection
List of Tables
1.
Standard deviation of strong edges for each block
6
FPGA-Based Canny Edge Detection
1
7
Introduction
Edge detection is a fundamental task in image processing and computer vision,
playing a crucial role in applications such as autonomous navigation, medical imaging, industrial inspection, and object recognition. Among various edge detection
techniques, the Canny Edge Detection Algorithm is widely regarded as the most
optimal due to its strong noise reduction capabilities, precise edge localization, and
ability to detect true edges while minimizing false positives. However, traditional
software-based implementations of Canny edge detection are computationally expensive and struggle to meet real-time processing requirements, particularly for
high-resolution images.
To address these challenges, hardware acceleration using Field Programmable
Gate Arrays (FPGAs) provides a compelling solution by offering parallel processing capabilities, low latency, and high computational efficiency. Unlike traditional
CPU or GPU-based implementations, FPGA-based designs enable deterministic
execution and allow for customized pipeline architectures, making them well-suited
for real-time edge detection applications.
Challenges in FPGA-Based Canny Edge Detection Implementing Canny edge
detection on an FPGA presents several challenges:
High Computational Complexity: The Canny algorithm consists of Gaussian
filtering, gradient computation, non-maximum suppression, and double thresholding, each requiring intensive arithmetic operations.
Memory Bottleneck: The need for multiple intermediate image buffers increases the demand for on-chip memory resources (BRAM or FIFO buffers).
Efficient Parallelization: Achieving real-time performance requires an efficient
pipeline architecture, where multiple stages of the algorithm operate in parallel
without bottlenecks.
Hardware Resource Utilization: Optimizing the usage of logic elements (LUTs),
DSP slices, and memory blocks is essential to fit the design on resource-constrained
FPGA platforms.
Proposed Solution: Distributed Canny Edge Detector on FPGA In this project,
we propose an FPGA-accelerated Distributed Canny Edge Detector, designed to
efficiently process images in real-time by distributing computational tasks across
multiple processing elements. The key innovations of our implementation include:
CORDIC-Based Gradient Computation: The use of Coordinate Rotation Digital Computer (CORDIC) algorithm eliminates the need for floating-point arithmetic, reducing resource utilization while maintaining high precision.
FIFO-Based Gaussian Filtering: A streaming architecture with FIFO buffers
enables efficient convolution operations without excessive memory requirements.
Pipeline Optimization for Parallel Processing: The FPGA implementation is
structured into multiple pipeline stages, allowing simultaneous execution of filtering, gradient computation, and edge suppression.
FPGA-Based Canny Edge Detection
8
Efficient BRAM Utilization: To store intermediate results and reduce latency,
block RAM (BRAM) is used instead of off-chip memory, ensuring minimal delays.
MATLAB and C++ Integration: The FPGA-based system is complemented
by MATLAB-based post-processing and C++ simulation to verify the correctness
of the hardware implementation.
Key Contributions of This Work The main contributions of this project are:
Development of a Hardware-Optimized Canny Edge Detector: We implement
a fully pipelined architecture for real-time edge detection using Verilog HDL on
an FPGA.
Resource-Efficient Gradient Computation with CORDIC: Our design eliminates the need for floating-point multiplications by leveraging CORDIC for angle
and magnitude computation.
Memory-Efficient Image Processing with FIFO Buffers: Instead of large frame
buffers, we use row-wise FIFO storage to optimize memory usage.
Comparison with Software-Based Implementations: The FPGA implementation is benchmarked against MATLAB and C++ implementations to demonstrate
its advantages in speed, latency, and hardware resource utilization.
Real-Time Performance Validation: The system is tested using a pre-stored
image loaded from BRAM, demonstrating its feasibility for real-time applications.
By leveraging hardware parallelism, memory-efficient design, and optimized
computational pipelines, our FPGA-based Distributed Canny Edge Detector offers
a high-speed, low-power alternative to traditional software-based edge detection
techniques. This work contributes to the advancement of real-time embedded
vision systems and serves as a foundation for future research in FPGA-accelerated
image processing.
FPGA-Based Canny Edge Detection
2
9
Literature Review
Edge detection is a fundamental technique in image processing and computer
vision, playing a crucial role in applications such as object recognition, medical
imaging, and autonomous navigation. Over the years, various edge detection
algorithms have been developed, each with different trade-offs in terms of accuracy,
computational efficiency, and noise resilience. Traditional methods, such as the
Sobel, Prewitt, and Roberts operators, primarily rely on first-order derivatives
to detect intensity variations. However, these methods often suffer from noise
sensitivity and lack the precision needed for real-world applications. The Laplacian
of Gaussian (LoG) method improves noise resistance by incorporating Gaussian
smoothing before edge detection, but its computational cost is significantly higher.
Among these approaches, the Canny Edge Detection algorithm, introduced by
John Canny in 1986, remains one of the most widely used due to its superior performance in detecting true edges while suppressing noise and false detections. The
algorithm consists of several stages, including Gaussian filtering, gradient computation, non-maximum suppression, and double-thresholding, making it more computationally intensive than earlier methods. Software-based implementations of
Canny Edge Detection using MATLAB, OpenCV, and Python have demonstrated
high accuracy, but their reliance on sequential processing limits real-time performance, especially for high-resolution images. Various optimization techniques,
such as multi-threading and GPU acceleration, have been explored to enhance
computational speed. However, while GPUs offer substantial improvements in execution time, their high power consumption makes them unsuitable for embedded
and resource-constrained applications. Approximate computing techniques, such
as reduced precision arithmetic, have also been investigated to reduce computational complexity, but they introduce slight accuracy trade-offs.
To overcome the performance limitations of software-based implementations,
researchers have explored hardware acceleration using Field-Programmable Gate
Arrays (FPGAs). FPGAs are well-suited for real-time image processing due to
their ability to execute multiple operations in parallel, significantly reducing latency compared to traditional CPU-based processing. Early FPGA implementations of Canny Edge Detection primarily focused on direct hardware translation
of the algorithm, but these designs often suffered from high resource utilization,
particularly due to the reliance on floating-point arithmetic. Subsequent studies have introduced optimizations such as replacing floating-point operations with
fixed-point arithmetic and using pipeline architectures to improve efficiency.
One of the key challenges in FPGA-based edge detection is the efficient implementation of gradient computation, which traditionally requires costly multiplications. Researchers have addressed this issue by leveraging the CORDIC
(Coordinate Rotation Digital Computer) algorithm, which eliminates the need
for multipliers by using iterative shift-and-add operations. The use of FIFO-
FPGA-Based Canny Edge Detection
10
based Gaussian filtering has also been proposed to minimize memory overhead,
reducing the reliance on large frame buffers. More recent studies have explored
distributed implementations of the Canny Edge Detector on FPGA, where computational tasks are distributed across multiple processing elements to achieve
real-time performance. These distributed approaches have demonstrated significant improvements in processing speed and scalability, making them ideal for
applications requiring real-time edge detection.
When compared to other hardware platforms, FPGA implementations offer
distinct advantages. Traditional CPU-based edge detection suffers from sequential execution bottlenecks, limiting real-time processing capabilities. GPUs provide better parallelism but at the cost of high power consumption, making them
impractical for embedded systems. While Application-Specific Integrated Circuits
(ASICs) can achieve superior efficiency, they lack the flexibility and reconfigurability of FPGAs. As a result, FPGAs strike a balance between performance, power
efficiency, and adaptability, making them an attractive choice for real-time image
processing applications.
Despite these advancements, challenges remain in optimizing FPGA-based
Canny Edge Detection for high-resolution images. Efficient memory utilization is
a critical concern, as many existing implementations require large on-chip memory
resources, limiting scalability. The reliance on floating-point arithmetic in gradient computation continues to be a bottleneck, which can be addressed through
further exploration of hardware-efficient alternatives like the CORDIC algorithm.
Additionally, integrating FPGA-based implementations with MATLAB and C++
simulations can provide a more systematic validation framework, ensuring accuracy before hardware deployment.
Building upon previous research, this project proposes a Distributed Canny
Edge Detector on FPGA that optimizes memory management, gradient computation, and pipeline-based processing to achieve real-time edge detection with minimal resource usage. By leveraging CORDIC-based arithmetic and distributed
computation, the proposed approach aims to provide a low-power, high-speed alternative for embedded vision systems. This work contributes to the advancement
of real-time image processing on FPGA platforms and serves as a foundation for
future research in hardware-accelerated computer vision applications.
FPGA-Based Canny Edge Detection
3
11
Problem Statement
Edge detection is essential in image processing applications such as object detection, medical imaging, and surveillance. The Canny Edge Detection algorithm
provides accurate results but is computationally intensive on CPUs and GPUs,
making real-time processing challenging. FPGA-based implementations offer a
power-efficient alternative but face issues like high resource utilization and memory
inefficiencies. Conventional floating-point arithmetic further increases latency and
power consumption. This project proposes a Distributed Canny Edge Detector
on FPGA, utilizing CORDIC-based arithmetic for efficient gradient computation
and FIFO-based Gaussian filtering to optimize memory usage. The design aims
to achieve low-latency, high-speed, and real-time edge detection, making it ideal
for embedded vision applications.
FPGA-Based Canny Edge Detection
4
12
Novelty
The novelty of this project lies in its efficient FPGA-based implementation of the
Distributed Canny Edge Detection algorithm, which optimizes real-time performance while minimizing hardware resource utilization. Unlike traditional Canny
edge detection methods that rely on high-power GPU or CPU-based processing,
this design leverages a pipeline-based distributed architecture to parallelize computations, significantly reducing latency. Additionally, the use of CORDIC-based
gradient calculation eliminates the need for conventional multipliers, making the
system more efficient in terms of power and area. The implementation also features a FIFO-based Gaussian filter, which optimizes memory usage compared to
standard convolution-based approaches. By integrating these enhancements, the
project achieves a high-speed, low-power, and scalable edge detection system, suitable for real-time image processing in embedded vision applications.
FPGA-Based Canny Edge Detection
5
13
Implementation Steps
• Preprocessing and Image Storage in FPGA : The input image is first
converted into grayscale and resized to 640x480 pixels to match FPGA processing requirements. The processed image is then stored in Block RAM
(BRAM) within the FPGA for efficient access during edge detection.
• Applying Gaussian Filtering : A FIFO-based Gaussian filter is implemented in Verilog to remove noise while preserving edges. The filter
performs convolution using a 3×3 kernel, reducing high-frequency noise
and improving edge accuracy.
• Computing Image Gradients using CORDIC Algorithm : The gradient magnitude and direction are computed using the CORDIC-based
gradient calculation instead of traditional multipliers. This approach efficiently determines the gradient in the X and Y directions, reducing FPGA
resource consumption.
• Non-Maximum Suppression for Edge Thinning : The obtained gradient values are processed using a non-maximum suppression module
to retain only the most significant edges. This is achieved by comparing
pixel intensities along the gradient direction and suppressing non-dominant
pixels.
• Double Thresholding for Edge Classification : A thresholding operation is applied to categorize pixels into strong edges, weak edges, or
non-edges. This helps in distinguishing valid edges from noise and ensures
proper edge continuity.
• Edge Tracking by Hysteresis : Weak edge pixels that are connected to
strong edges are retained, while isolated weak edges are suppressed. This
ensures that fragmented edges are correctly reconstructed in the final output.
• FPGA Implementation and Simulation : The Canny edge detection
pipeline is synthesized and simulated using Vivado to verify correctness.
The testbench simulates edge detection using an input image and produces
a binary edge map as output.
• Hardware Deployment and Real-Time Testing : The design is deployed on the Nexys A7 FPGA board, where images are loaded from
BRAM. The output edge-detected image is exported as a binary map and
compared with MATLAB/C++ results for validation.
• Performance Evaluation : The FPGA-based implementation is analyzed
for resource utilization (LUTs, BRAM, DSP slices), latency, and
power efficiency.
FPGA-Based Canny Edge Detection
14
Figure 1: Workflow of the project
6
Objectives
6.1
Gaussian Filtering
Gaussian filtering is an essential preprocessing step in edge detection, particularly
in the Canny Edge Detection algorithm, as it smooths the image and reduces noise.
High-frequency noise in an image can create false edges, making the detection
process unreliable. The Gaussian filter works by convolving the input image with
a Gaussian kernel, which applies weighted averaging to the surrounding pixels.
This step ensures that unnecessary variations due to noise are suppressed while
preserving important structural details in the image. The Gaussian kernel size
is typically 3×3 or 5×5, and the values in the kernel are chosen based on the
standard deviation (σ) of the Gaussian function.
On FPGA, the Gaussian filter is implemented using a FIFO-based approach,
where pixels stream through a 3×3 convolutional window. Instead of using complex multiplications, shift-and-add operations are utilized to approximate the
Gaussian function efficiently. This technique reduces hardware resource utilization, ensuring low latency and high-speed filtering. The filtered image is then
passed to the next stage of the pipeline, where gradient computation takes place.
6.2
Gradient Computation
After the image is smoothed using Gaussian filtering, the next step is to compute
the intensity gradients to identify regions of rapid intensity change, which correspond to edges in the image. This is done using Sobel operators, which apply two
convolution kernels to detect changes along both the X-axis (horizontal edges) and
Y-axis (vertical edges). The gradient magnitude is computed as:
G=
q
G2x + G2y
where Gx and Gy are the gradients in the horizontal and vertical directions,
respectively. The gradient angle (θ) is also calculated to determine the edge ori-
FPGA-Based Canny Edge Detection
15
entation.
In FPGA, CORDIC (COordinate Rotation DIgital Computer) is used to compute gradient magnitude and direction instead of using expensive floating-point
operations. The CORDIC algorithm provides an efficient way to perform vector rotations, allowing gradient computation with only shift and add operations
instead of multiplications.
6.3
Non-Maximum Suppression
Non-Maximum Suppression (NMS) is a crucial step that refines the detected edges
by thinning them to single-pixel width. After computing the gradient magnitude
and direction, edges appear as thick lines due to gradient variations. However, only
the strongest edges should be retained while weaker ones must be suppressed. NMS
achieves this by comparing each pixel’s gradient magnitude with its neighboring
pixels along the gradient direction.
6.4
Double Thresholding
After applying NMS, some weak edges remain, which may be caused by noise or
real edges. To distinguish them, double thresholding is used to classify pixels into
three categories: strong edges, weak edges, and non-edges.
6.5
Hardware Implementation
The final objective is to implement the entire Canny Edge Detection pipeline
on an FPGA platform. This involves integrating all modules—Gaussian filtering, gradient computation, NMS, and double thresholding—into a fully pipelined
architecture.
FPGA-Based Canny Edge Detection
7
Theory
7.1
Gaussian Filtering
16
Gaussian filtering is a fundamental image processing technique used for noise
reduction and smoothing in digital images. It is widely applied in various computer
vision and image processing tasks, including edge detection, feature extraction,
and preprocessing for deep learning models.
7.1.1
Concept of Gaussian Filtering
Gaussian filtering is based on the Gaussian function, which is a bell-shaped curve
defined as:
G(x, y) =
1 − x2 +y2 2
e 2σ
2πσ 2
(1)
where:
• x, y are the spatial coordinates of the pixel,
• σ is the standard deviation (spread) of the Gaussian distribution,
• The denominator 2πσ 2 ensures normalization, so the sum of all weights in
the filter kernel is 1.
A Gaussian filter applies this function over a region of an image by convolving
a Gaussian kernel with the image pixels. The kernel weights are determined by
evaluating the Gaussian function at each pixel position in a given window.
7.1.2
Purpose of Gaussian Filtering
• Noise Reduction: The filter smooths out high-frequency variations caused
by noise while preserving important image structures. It is commonly used
in preprocessing for edge detection (e.g., in the Canny Edge Detection algorithm).
• Edge-Preserving Smoothing: Unlike a simple mean filter, which blurs
the image uniformly, the Gaussian filter gives more weight to central pixels,
reducing excessive blurring.
• Preprocessing for Edge Detection: In edge detection techniques like
Canny Edge Detection, Gaussian filtering helps remove noise before detecting edges, ensuring better accuracy.
FPGA-Based Canny Edge Detection
7.1.3
17
Implementation of Gaussian Filtering
Gaussian filtering is implemented using a convolution operation where each pixel in
the image is replaced by a weighted sum of its neighbors, defined by the Gaussian
kernel. A typical 5×5 Gaussian kernel with σ = 1 is:
1 4 7 4 1
4 16 26 16 4
7 26 41 26 7
4 16 26 16 4
1 4 7 4 1
Each pixel’s new value is computed as the sum of the products of the kernel
values and the corresponding neighborhood pixel values.
7.1.4
Advantages of Gaussian Filtering
• Reduces noise without significantly affecting image structures.
• Preserves important features, unlike a simple averaging filter.
• Can be efficiently implemented using separable kernels, reducing computational cost.
7.1.5
Gaussian Filtering in FPGA-Based Processing
In hardware implementations like FPGAs, Gaussian filtering is performed using a
streaming architecture with FIFO buffers or line buffers to efficiently handle
pixel data. The convolution operation is parallelized using hardware multipliers
(DSP slices) to achieve real-time processing speeds.
By optimizing memory usage and pipeline architecture, FPGA-based Gaussian
filtering provides low-latency, high-throughput image processing, making it
ideal for applications such as real-time edge detection, medical imaging,
and autonomous vision systems.
7.2
Gradient Computation
Gradient computation is a crucial step in edge detection algorithms, particularly
in the Canny Edge Detection Algorithm. The gradient of an image represents changes in intensity, highlighting regions with significant variations, which
typically correspond to edges.
7.2.1
Concept of Image Gradient
The image gradient measures the rate of change of pixel intensities in an image.
Mathematically, it is computed as the derivative of the image function. Given
FPGA-Based Canny Edge Detection
18
a grayscale image I(x, y), the gradient components are computed using partial
derivatives in both the x- and y-directions:
Gx =
∂I
,
∂x
Gy =
∂I
∂y
(2)
where:
• Gx represents changes in intensity along the horizontal direction.
• Gy represents changes in intensity along the vertical direction.
7.2.2
Sobel Operator for Gradient Computation
A common method for computing gradients in digital images is the Sobel operator, which applies convolution with two predefined 3×3 kernels to approximate
the derivatives:
−1 0 1
Sx = −2 0 2 ,
−1 0 1
−1 −2 −1
Sy = 0
0
0
1
2
1
Applying these kernels to an image results in two gradient matrices:
Gx = I ∗ Sx ,
Gy = I ∗ Sy
(3)
where ∗ represents the convolution operation.
7.2.3
Gradient Magnitude and Direction
Once the gradients Gx and Gy are computed, the overall gradient magnitude is
determined using the Euclidean norm:
q
G = G2x + G2y
(4)
The gradient direction (edge orientation) is computed as:
−1
θ = tan
Gy
Gx
(5)
The angle θ determines the direction in which the intensity changes the most.
This information is crucial for non-maximum suppression, where only the
strongest edges in the gradient direction are retained.
7.2.4
FPGA Implementation of Gradient Computation
Implementing gradient computation on an FPGA requires efficient handling of
convolution operations and parallel processing. The key optimizations include:
FPGA-Based Canny Edge Detection
19
• Pipelining: To perform simultaneous computations on different pixels.
• Fixed-Point Arithmetic: To avoid resource-intensive floating-point operations.
• Hardware Multipliers (DSP Slices): For efficient computation of Sobel
filters.
• Memory Optimization: Using Block RAM (BRAM) or FIFO buffers
for storing image rows instead of full frames.
In an FPGA-based edge detection system, the gradient computation stage
produces pixel-wise edge strength and orientation, which are then processed in
subsequent steps like non-maximum suppression and thresholding.
Figure 2: Block diagram for edge detection
7.3
Non-Maximum Suppression
Non-Maximum Suppression (NMS) is a crucial step in the Canny Edge Detection Algorithm that refines edge detection by removing weak edges and
retaining only the most significant ones. After computing the gradient
magnitude and direction, many pixels might have high intensity due to noise
or multiple neighboring pixels detecting the same edge. The goal of NMS is to
thin out edges, ensuring only the sharpest and most accurate edges remain in
the image.
7.3.1
Working Principle of Non-Maximum Suppression
The NMS algorithm works by scanning the entire image pixel-by-pixel and suppressing any pixel that is not a local maximum in the direction of the
gradient. This is done using the following steps:
FPGA-Based Canny Edge Detection
20
1. Determine Gradient Direction: Using the previously computed gradient
components Gx and Gy , the gradient angle θ is computed as:
−1
θ = tan
Gy
Gx
(6)
Since digital images are discrete, the gradient angle θ is quantized to one
of four principal directions:
• 0° (Horizontal)
• 45° (Diagonal - Top Left to Bottom Right)
• 90° (Vertical)
• 135° (Diagonal - Bottom Left to Top Right)
2. Edge Thinning via Local Maximum Check: Each pixel in the gradient magnitude image is compared with its two neighboring pixels in the
gradient direction:
• If the pixel’s intensity is the highest compared to its neighbors along
the gradient direction, it is retained.
• Otherwise, it is suppressed (set to 0).
The four possible cases are:
• 0°: Compare with left and right neighbors.
• 45°: Compare with top-left and bottom-right neighbors.
• 90°: Compare with top and bottom neighbors.
• 135°: Compare with top-right and bottom-left neighbors.
7.3.2
Mathematical Representation
For a given pixel at position (i, j) with gradient magnitude G(i, j), the suppression
is performed as:
G(i, j), if G(i, j) is the maximum along gradient direction
G′ (i, j) =
0,
otherwise
7.3.3
(7)
FPGA Implementation of Non-Maximum Suppression
Implementing NMS on an FPGA requires efficient handling of pixel comparisons
and memory access. The main considerations include:
• Sliding Window Buffer: A small 3x3 pixel window is maintained using
FIFO buffers or Block RAM (BRAM) for efficient processing.
FPGA-Based Canny Edge Detection
21
• Parallel Processing: The gradient angle computation and pixel comparison are performed in parallel using hardware comparators and multiplexers.
• Conditional Operations: Edge retention is determined using a combination of multiplexers (MUX) and logical conditions.
By implementing these optimizations, the FPGA can efficiently process the
image in real-time while ensuring accurate edge detection.
7.4
Double Thresholding and Edge Tracking
After Non-Maximum Suppression, the detected edges may still contain noise and
weak responses. To refine the edges and remove false detections, Double Thresholding and Edge Tracking by Hysteresis are applied. This step ensures that
only strong edges are retained while weak edges are either connected to strong
edges or discarded.
7.4.1
Double Thresholding
Double Thresholding is used to classify pixels into three categories based on their
gradient magnitudes:
1. Strong Edges: Pixels with gradient magnitude greater than the high
threshold are considered definite edges.
2. Weak Edges: Pixels with gradient magnitude between the low and high
thresholds are considered potential edges.
3. Non-Edges: Pixels with gradient magnitude below the low threshold
are suppressed (set to 0).
Mathematically, if a pixel at position (i, j) has a gradient magnitude G(i, j),
it is classified as:
1,
if G(i, j) ≥ Thigh (Strong Edge)
E(i, j) = 0,
if G(i, j) < Tlow (Non-Edge)
Weak Edge, if T ≤ G(i, j) < T
low
high
(8)
where Thigh and Tlow are the high and low threshold values, respectively.
7.4.2
Edge Tracking by Hysteresis
Edge Tracking is used to determine whether weak edges should be retained or
discarded. It follows these rules:
FPGA-Based Canny Edge Detection
22
• If a weak edge is connected to a strong edge, it is retained as a strong
edge.
• If a weak edge is isolated, it is discarded as noise.
This is typically implemented using a recursive or queue-based approach:
1. Scan the image for weak edges.
2. Check the 8-neighborhood of each weak edge pixel.
3. If at least one neighboring pixel is a strong edge, mark it as a
strong edge.
4. Otherwise, discard it (set to 0).
7.4.3
FPGA Implementation of Double Thresholding and Edge Tracking
Implementing this step on an FPGA requires careful resource optimization due
to:
• Memory Access: Storing intermediate pixel states (strong, weak, nonedge).
• Parallel Processing: Checking 8-neighborhood connections efficiently.
• Threshold Comparisons: Using comparators and lookup tables (LUTs)
for classification.
By implementing these optimizations, the FPGA efficiently filters out noise,
preserves real edges, and achieves real-time performance.
7.5
Hardware Implementation
The complete pipeline is implemented in a pipelined FPGA architecture, ensuring
real-time processing. The processed image is stored in BRAM and later exported
for verification using MATLAB.
FPGA-Based Canny Edge Detection
8
23
Results and Conclusions
The implementation of a Distributed Canny Edge Detector on FPGA presents
a highly efficient and parallelized approach to real-time edge detection in image processing applications. By leveraging FPGA’s hardware parallelism, the
system achieves significant improvements in processing speed, power efficiency,
and resource utilization compared to traditional software-based implementations.
The modular design, incorporating Gaussian filtering, gradient computation, nonmaximum suppression, and double thresholding, ensures a systematic and optimized execution of the Canny edge detection algorithm.
Figure 3: Output simulation for distributed canny edge algorithm
Figure 4: Normalized gradient magnitude CDFs and CDFs of Blocks
FPGA-Based Canny Edge Detection
Figure 5: Gradient magnitude and its probability
Figure 6: Strong edges percentage value
Figure 7: Standard deviation of strong edges for each block
24
FPGA-Based Canny Edge Detection
9
25
Future Work
The Distributed Canny Edge Detector implemented on FPGA has shown
promising results for real-time edge detection. However, several improvements
and extensions can be considered for future research and development:
• Hardware Optimization for Power Efficiency: Implementing techniques such as dynamic voltage scaling, clock gating, and power-aware FPGA
design to reduce energy consumption for embedded applications in robotics
and drones.
• Adaptive Thresholding Techniques: Instead of using fixed thresholds,
integrating machine learning-based adaptive thresholding can improve robustness under varying lighting conditions, making the system more adaptive to real-world applications.
• 3D Edge Detection and Depth Estimation: Extending the algorithm
to process stereo images or depth maps can enhance applications in medical
imaging, augmented reality, and autonomous navigation.
• Real-Time Processing of High-Resolution Images: Optimizing memory bandwidth and implementing high-bandwidth memory (HBM) and efficient pipelining strategies to process 4K and higher-resolution images
in real time.
• Integration with AI-based Object Detection: Combining FPGA-based
edge detection with deep learning-based edge refinement to enhance feature
extraction and segmentation for applications in security surveillance, industrial automation, and medical diagnostics.
Future research in these areas will enhance the scalability, efficiency, and
adaptability of FPGA-based edge detection, making it suitable for more advanced real-time vision applications.
FPGA-Based Canny Edge Detection
26
A Appendix
This section includes supplementary materials such as Verilog codes, additional
diagrams, or extended explanations that support the main content of the report.
module canny_edge_detect_top#(
parameter DATA_WIDTH = 8,
parameter DATA_DEPTH = 512
)(
input clk,
input rst_n,
input per_frame_vsync,
input per_frame_href,
input per_frame_clken,
input [DATA_WIDTH-1:0] per_img_gray,
output canny_vsync,
output canny_href,
output canny_clken,
output canny_bit
);
wire gaus_vsync;
wire gaus_href;
wire gaus_clken;
wire [DATA_WIDTH-1:0] gaus_img;
wire grandient_vs;
wire grandient_hs;
wire grandient_de;
wire [15:0] gra_path;
wire post_frame_vsync;
wire post_frame_href;
wire post_frame_clken;
wire [1:0] max_g;
wire hysteria_vsync;
wire hysteria_href;
wire hysteria_clken;
wire [1:0] hysteria_data;
reg [DATA_WIDTH-1:0] gaus_img_d1;
FPGA-Based Canny Edge Detection
reg gaus_vsync_d1,gaus_href_d1,gaus_clken_d1;
reg gaus_vsync_d0,gaus_href_d0,gaus_clken_d0;
image_gaussian_filter u_image_gaussian_filter(
.clk(clk),
.rst_n(rst_n),
.per_frame_vsync(per_frame_vsync),
.per_frame_href(per_frame_href),
.per_frame_clken(per_frame_clken),
.per_img_gray(per_img_gray),
.post_frame_vsync(gaus_vsync),
.post_frame_href(gaus_href),
.post_frame_clken(gaus_clken),
.post_img_gray(gaus_img)
);
always@(posedge clk or negedge rst_n)begin
if(!rst_n)begin
gaus_img_d1 <= 0;
gaus_vsync_d0 <= 0;
gaus_href_d0 <= 0;
gaus_clken_d0 <= 0;
gaus_vsync_d1 <= 0;
gaus_href_d1 <= 0;
gaus_clken_d1 <= 0;
end
else begin
gaus_img_d1 <= gaus_img;
gaus_vsync_d0 <= gaus_vsync;
gaus_href_d0 <= gaus_href;
gaus_clken_d0 <= gaus_clken;
gaus_vsync_d1 <= gaus_vsync_d0;
gaus_href_d1 <= gaus_href_d0;
gaus_clken_d1 <= gaus_clken_d0;
end
end
canny_get_grandient#(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)u_canny_get_grandient(
.clk(clk),
27
FPGA-Based Canny Edge Detection
.rst_s(rst_n),
.mediant_hs(gaus_href_d1),
.mediant_vs(gaus_vsync_d1),
.mediant_de(gaus_clken_d1),
.mediant_img(gaus_img_d1),
.grandient_hs(grandient_hs),
.grandient_vs(grandient_vs),
.grandient_de(grandient_de),
.gra_path(gra_path)
);
canny_nonLocalMaxValue#(
.DATA_WIDTH(16),
.DATA_DEPTH(DATA_DEPTH)
)u_canny_nonLocalMaxValue(
.clk(clk),
.rst_s(rst_n),
.grandient_vs(grandient_vs),
.grandient_hs(grandient_hs),
.grandient_de(grandient_de),
.gra_path(gra_path),
.post_frame_vsync(post_frame_vsync),
.post_frame_href(post_frame_href),
.post_frame_clken(post_frame_clken),
.max_g(max_g)
);
assign canny_vsync = post_frame_vsync;
assign canny_href = post_frame_href;
assign canny_clken = post_frame_clken;
assign canny_bit = max_g[1] | max_g[0];
endmodule
module canny_get_grandient#(
parameter DATA_WIDTH = 8,
parameter DATA_DEPTH = 512
)(
input clk,
input rst_s,
input mediant_hs,
input mediant_vs,
input mediant_de,
28
FPGA-Based Canny Edge Detection
input [DATA_WIDTH -1 : 0] mediant_img,
output grandient_hs,
output grandient_vs,
output grandient_de,
output reg [15 : 0] gra_path
);
parameter THRESHOLD_LOW = 10’d50;
parameter THRESHOLD_HIGH = 10’d100;
reg[9:0] Gx_1;
reg[9:0] Gx_3;
reg[9:0] Gy_1;
reg[9:0] Gy_3;
reg[10:0] Gx;
reg[10:0] Gy;
reg[23:0] sqrt_in;
reg[9:0] sqrt_out;
reg[10:0] sqrt_rem;
wire [23:0] sqrt_in_n;
wire [15:0] sqrt_out_n;
wire [10:0] sqrt_rem_n;
wire [6 :0] angle_out;
wire [7:0] ma1_1;
wire [7:0] ma1_2;
wire [7:0] ma1_3;
wire [7:0] ma2_1;
wire [7:0] ma2_2;
wire [7:0] ma2_3;
wire [7:0] ma3_1;
wire [7:0] ma3_2;
wire [7:0] ma3_3;
reg edge_de_a;
reg edge_de_b;
wire edge_de;
reg [9:0] row_cnt;
reg[1:0] sign;
29
FPGA-Based Canny Edge Detection
reg type;
reg [8:0] type_d;
wire sobel_vsync;
wire sobel_href;
wire sobel_clken;
matrix_generate_3x3 #(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)u_matrix_generate_3x3(
.clk(clk),
.rst_n(rst_s),
.per_frame_vsync(mediant_vs),
.per_frame_href(mediant_hs),
.per_frame_clken(mediant_de),
.per_img_y(mediant_img),
.matrix_frame_vsync(sobel_vsync),
.matrix_frame_href(sobel_href),
.matrix_frame_clken(sobel_clken),
.matrix_p11(ma1_1),
.matrix_p12(ma1_2),
.matrix_p13(ma1_3),
.matrix_p21(ma2_1),
.matrix_p22(ma2_2),
.matrix_p23(ma2_3),
.matrix_p31(ma3_1),
.matrix_p32(ma3_2),
.matrix_p33(ma3_3)
);
always @ (posedge clk or negedge rst_s) begin
if(!rst_s) begin
Gx_1 <= 10’d0;
Gx_3 <= 10’d0;
end
else begin
Gx_1 <= {2’b00,ma1_1} + {1’b0,ma2_1,1’b0} + {2’b0,ma3_1};
Gx_3 <= {2’b00,ma1_3} + {1’b0,ma2_3,1’b0} + {2’b0,ma3_3};
end
end
30
FPGA-Based Canny Edge Detection
always @ (posedge clk or negedge rst_s) begin
if(!rst_s) begin
Gy_1 <= 10’d0;
Gy_3 <= 10’d0;
end
else begin
Gy_1 <= {2’b00,ma1_1} + {1’b0,ma1_2,1’b0} +{2’b0,ma1_3};
Gy_3 <= {2’b00,ma3_1} + {1’b0,ma3_2,1’b0} +{2’b0,ma3_3};
end
end
always @(posedge clk or negedge rst_s) begin
if(!rst_s) begin
Gx <= 11’d0;
Gy <= 11’d0;
sign <= 2’b00;
end
else begin
Gx <= (Gx_1 >= Gx_3)? Gx_1 - Gx_3 : Gx_3 - Gx_1;
Gy <= (Gy_1 >= Gy_3)? Gy_1 - Gy_3 : Gy_3 - Gy_1;
sign[0] <= (Gx_1 >= Gx_3)? 1’b1 : 1’b0;
sign[1] <= (Gy_1 >= Gy_3)? 1’b1 : 1’b0;
end
end
always @ (posedge clk or negedge rst_s) begin
if(!rst_s)
type <= 1’b0;
else if(sign[0]^sign[1])
type <= 1’b1;
else
type <= 1’b0;
end
always@(posedge clk or negedge rst_s)begin
if(!rst_s)begin
type_d <= 0;
end
else begin
type_d <= {type_d[7:0],type};
end
31
FPGA-Based Canny Edge Detection
32
end
wire path_fou_f;
wire path_thr_f;
wire path_two_f;
wire path_one_f;
cordic_sqrt#(
.DATA_WIDTH_IN(11),
.DATA_WIDTH_OUT(22),
.Pipeline(9)
)u_cordic_sqrt(
.clk(clk),
.rst_n(rst_s),
.sqrt_in_0(Gx),
.sqrt_in_1(Gy),
.sqrt_out(sqrt_out_n),
.angle_out(angle_out)
);
assign start = (path_one_f | path_thr_f) ? 1’b0 : 1’b1;
assign path_fou_f = (start) ? type_d[8] : 1’b0;
assign path_thr_f = (angle_out << 2) > 135 ? 1’b1 : 1’b0;
assign path_two_f = (start) ? ~type_d[8] : 1’b0;
assign path_one_f = (angle_out << 2) < 45 ? 1’b1 : 1’b0;
always @(posedge clk or negedge rst_s)begin
if(!rst_s)
gra_path <= 16’d0;
else if (sqrt_out_n > THRESHOLD_HIGH)
gra_path <= {1’b1,1’b0,path_fou_f,path_thr_f,path_two_f,path_one_f,sqrt_out_n[9:0]}
else if (sqrt_out_n > THRESHOLD_LOW)
gra_path <= {1’b0,1’b1,path_fou_f,path_thr_f,path_two_f,path_one_f,sqrt_out_n[9:0]}
else
gra_path <= 16’d0;
end
reg [10:0] sobel_vsync_t;
reg [10:0] sobel_href_t;
reg [10:0] sobel_clken_t;
always@(posedge clk or negedge rst_s) begin
if (!rst_s) begin
FPGA-Based Canny Edge Detection
sobel_vsync_t <= 11’d0;
sobel_href_t <= 11’d0;
sobel_clken_t <= 11’d0;
end
else begin
sobel_vsync_t <= {sobel_vsync_t[9:0], sobel_vsync};
sobel_href_t <= {sobel_href_t[9:0], sobel_href};
sobel_clken_t <= {sobel_clken_t[9:0], sobel_clken};
end
end
assign grandient_hs = sobel_href_t[10];
assign grandient_vs = sobel_vsync_t[10];
assign grandient_de = sobel_clken_t[10];
endmodule
module canny_nonLocalMaxValue#(
parameter DATA_WIDTH = 16,
parameter DATA_DEPTH = 512
)(
input clk,
input rst_s,
input grandient_vs,
input grandient_hs,
input grandient_de,
input [DATA_WIDTH - 1 : 0] gra_path,
output post_frame_vsync,
output post_frame_href,
output post_frame_clken,
output reg [1 : 0] max_g
);
wire [DATA_WIDTH - 1 : 0] max1_1;
wire [DATA_WIDTH - 1 : 0] max1_2;
wire [DATA_WIDTH - 1 : 0] max1_3;
wire [DATA_WIDTH - 1 : 0] max2_1;
wire [DATA_WIDTH - 1 : 0] max2_2;
wire [DATA_WIDTH - 1 : 0] max2_3;
wire [DATA_WIDTH - 1 : 0] max3_1;
wire [DATA_WIDTH - 1 : 0] max3_2;
wire [DATA_WIDTH - 1 : 0] max3_3;
33
FPGA-Based Canny Edge Detection
34
wire [3 : 0] path_se;
wire nonLocalMaxValue_vsync;
wire nonLocalMaxValue_href;
wire nonLocalMaxValue_clken;
matrix_generate_3x3 #(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)u_matrix_generate_3x3(
.clk(clk),
.rst_n(rst_s),
.per_frame_vsync(grandient_vs),
.per_frame_href(grandient_hs),
.per_frame_clken(grandient_de),
.per_img_y(gra_path),
.matrix_frame_vsync(nonLocalMaxValue_vsync),
.matrix_frame_href(nonLocalMaxValue_href),
.matrix_frame_clken(nonLocalMaxValue_clken),
.matrix_p11(max1_1),
.matrix_p12(max1_2),
.matrix_p13(max1_3),
.matrix_p21(max2_1),
.matrix_p22(max2_2),
.matrix_p23(max2_3),
.matrix_p31(max3_1),
.matrix_p32(max3_2),
.matrix_p33(max3_3)
);
assign path_se = max2_2[13:10];
always @ (posedge clk or negedge rst_s) begin
if(!rst_s)
max_g <= 2’d0;
else
case (path_se)
4’b0001:
max_g <=((max2_2[9:0]> max2_1[9:0])&(max2_2[9:0]> max2_3[9:0]))?{max2_2[15:14]} : 2
4’b0010:
max_g <=((max2_2[9:0]> max1_3[9:0])&(max2_2[9:0]> max3_1[9:0]))?{max2_2[15:14]} : 2
4’b0100:
FPGA-Based Canny Edge Detection
35
max_g <=((max2_2[9:0]> max1_2[9:0])&(max2_2[9:0]> max3_2[9:0]))?{max2_2[15:14]} : 2
4’b1000:
max_g <=((max2_2[9:0]> max1_1[9:0])&(max2_2[9:0]> max3_3[9:0]))?{max2_2[15:14]} : 2
default:
max_g <= 2’d0;
endcase
end
reg nonLocalMaxValue_vsync_d1;
reg nonLocalMaxValue_href_d1;
reg nonLocalMaxValue_clken_d1;
always@(posedge clk or negedge rst_s) begin
if (!rst_s)begin
nonLocalMaxValue_vsync_d1 <= 0;
nonLocalMaxValue_href_d1 <= 0;
nonLocalMaxValue_clken_d1 <= 0;
end
else begin
nonLocalMaxValue_vsync_d1 <= nonLocalMaxValue_vsync;
nonLocalMaxValue_href_d1 <= nonLocalMaxValue_href;
nonLocalMaxValue_clken_d1 <= nonLocalMaxValue_clken;
end
end
assign post_frame_vsync = nonLocalMaxValue_vsync_d1;
assign post_frame_href = nonLocalMaxValue_href_d1;
assign post_frame_clken = nonLocalMaxValue_clken_d1;
endmodule
module cordic_pipline#(
parameter DATA_WIDTH_IN = 11,
parameter Pipeline = 16
)(
input clk,
input rst_n,
input signed [DATA_WIDTH_IN - 1 : 0] x_in,
input signed [DATA_WIDTH_IN - 1 : 0] y_in,
input polar_flag,
input [5 : 0] pipline_level,
input [31 : 0] rot,
input [31 : 0] rot_in,
FPGA-Based Canny Edge Detection
output reg [31 : 0] rot_out,
output reg signed [DATA_WIDTH_IN - 1 : 0] x_out,
output reg signed [DATA_WIDTH_IN - 1 : 0] y_out
);
always @ (posedge clk or negedge rst_n) begin
if(!rst_n) begin
x_out <= 0;
y_out <= 0;
end
else begin
if(polar_flag)begin
x_out <= x_in + (y_in >>> pipline_level);
y_out <= y_in - (x_in >>> pipline_level);
rot_out <= rot_in + rot;
end
else begin
x_out <= x_in - (y_in >>> pipline_level);
y_out <= y_in + (x_in >>> pipline_level);
rot_out <= rot_in - rot;
end
end
end
endmodule
module cordic_sqrt#(
parameter DATA_WIDTH_IN = 11,
parameter DATA_WIDTH_OUT = 22,
parameter Pipeline = 8
)(
input clk,
input rst_n,
input [DATA_WIDTH_IN - 1 : 0] sqrt_in_0,
input [DATA_WIDTH_IN - 1 : 0] sqrt_in_1,
output [DATA_WIDTH_OUT - 1: 0] sqrt_out,
output [6 : 0] angle_out
);
parameter rot0 = 32’d2949120;
parameter rot1 = 32’d1740992;
parameter rot2 = 32’d919872;
parameter rot3 = 32’d466944;
36
FPGA-Based Canny Edge Detection
37
parameter rot4 = 32’d234368;
parameter rot5 = 32’d117312;
parameter rot6 = 32’d58688;
parameter rot7 = 32’d29312;
parameter rot8 = 32’d14656;
parameter rot9 = 32’d7360;
parameter rot10 = 32’d3648;
parameter rot11 = 32’d1856;
parameter rot12 = 32’d896;
parameter rot13 = 32’d448;
parameter rot14 = 32’d256;
parameter rot15 = 32’d128;
parameter K = 32’h09b74;
wire signed [DATA_WIDTH_IN - 1 : 0] x[16 : 0];
wire signed [DATA_WIDTH_IN - 1 : 0] y[16 : 0];
wire [31 : 0] rot_out[15 : 0];
assign x[0] = sqrt_in_0;
assign y[0] = sqrt_in_1;
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
cordic_pipline#(.DATA_WIDTH_IN(DATA_WIDTH_IN),.Pipeline(Pipeline))u_cordic_pipline_
assign sqrt_out = x[Pipeline - 1] * K >> 16;
assign angle_out = rot_out[Pipeline - 1] >> 16;
FPGA-Based Canny Edge Detection
38
endmodule
module fifo_ram#(
parameter DATA_WIDTH = 8,
parameter DATA_DEPTH = 512
)
(
input clk,
input wr_en,
input [DATA_WIDTH - 1 : 0] wr_data,
output wr_full,
input rd_en,
output reg [DATA_WIDTH - 1 : 0] rd_data,
output rd_empty
);
(*ram_style = "block" *) reg [DATA_WIDTH - 1 : 0] fifo_buffer[DATA_DEPTH - 1 : 0];
integer i;
initial begin
for(i = 0;i<DATA_DEPTH;i = i + 1)begin
fifo_buffer[i] <= 0;
end
end
reg [$clog2(DATA_DEPTH) - 1 : 0] wr_pointer = 0;
reg [$clog2(DATA_DEPTH) - 1 : 0] rd_pointer = 0;
wire [DATA_WIDTH - 1 : 0] rd_data_out;
always @(posedge clk) begin
if (wr_en) begin
if (wr_pointer == DATA_DEPTH - 1) begin
wr_pointer <= 0;
end
else begin
wr_pointer <= wr_pointer + 1;
end
end
end
always @(posedge clk) begin
if (rd_en) begin
if (rd_pointer == DATA_DEPTH - 1) begin
FPGA-Based Canny Edge Detection
rd_pointer <= 0;
end
else begin
rd_pointer <= rd_pointer + 1;
end
end
end
always @(posedge clk) begin
if (wr_en) begin
fifo_buffer[wr_pointer] <= wr_data;
end
end
assign rd_data_out = rd_en ? fifo_buffer[rd_pointer] : 0;
always @(posedge clk) begin
rd_data <= rd_data_out;
end
endmodule
module matrix_generate_3x3#(
parameter DATA_WIDTH = 8,
parameter DATA_DEPTH = 512
)
(
input clk,
input rst_n,
input per_frame_vsync,
input per_frame_href,
input per_frame_clken,
input [DATA_WIDTH - 1 :0] per_img_y,
output matrix_frame_vsync,
output matrix_frame_href,
output matrix_frame_clken,
output reg [DATA_WIDTH - 1 :0] matrix_p11,
output reg [DATA_WIDTH - 1 :0] matrix_p12,
output reg [DATA_WIDTH - 1 :0] matrix_p13,
output reg [DATA_WIDTH - 1 :0] matrix_p21,
output reg [DATA_WIDTH - 1 :0] matrix_p22,
output reg [DATA_WIDTH - 1 :0] matrix_p23,
output reg [DATA_WIDTH - 1 :0] matrix_p31,
39
FPGA-Based Canny Edge Detection
output reg [DATA_WIDTH - 1 :0] matrix_p32,
output reg [DATA_WIDTH - 1 :0] matrix_p33
);
wire [DATA_WIDTH - 1 : 0] row1_data;
wire [DATA_WIDTH - 1 : 0] row2_data;
wire [DATA_WIDTH - 1 : 0] row3_data;
wire read_frame_href;
wire read_frame_clken;
reg [1:0] per_frame_vsync_r;
reg [1:0] per_frame_href_r;
reg [1:0] per_frame_clken_r;
assign read_frame_href = per_frame_href_r[0];
assign read_frame_clken = per_frame_clken_r[0];
assign matrix_frame_vsync = per_frame_vsync_r[1];
assign matrix_frame_href = per_frame_href_r[1];
assign matrix_frame_clken = per_frame_clken_r[1];
one_column_ram #(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)u_one_column_ram(
.clock(clk),
.clken(per_frame_clken),
.shiftin(per_img_y),
.taps0x(row3_data),
.taps1x(row2_data),
.taps2x(row1_data)
);
always@(posedge clk or negedge rst_n) begin
if(!rst_n) begin
per_frame_vsync_r <= 0;
per_frame_href_r <= 0;
per_frame_clken_r <= 0;
end
else begin
per_frame_vsync_r <= { per_frame_vsync_r[0], per_frame_vsync };
per_frame_href_r <= { per_frame_href_r[0], per_frame_href };
per_frame_clken_r <= { per_frame_clken_r[0], per_frame_clken };
40
FPGA-Based Canny Edge Detection
41
end
end
always@(posedge clk or negedge rst_n) begin
if(!rst_n) begin
{matrix_p11, matrix_p12, matrix_p13} <= 24’h0;
{matrix_p21, matrix_p22, matrix_p23} <= 24’h0;
{matrix_p31, matrix_p32, matrix_p33} <= 24’h0;
end
else if(read_frame_href) begin
if(read_frame_clken) begin
{matrix_p11, matrix_p12, matrix_p13} <= {matrix_p12, matrix_p13, row1_data};
{matrix_p21, matrix_p22, matrix_p23} <= {matrix_p22, matrix_p23, row2_data};
{matrix_p31, matrix_p32, matrix_p33} <= {matrix_p32, matrix_p33, row3_data};
end
else begin
{matrix_p11, matrix_p12, matrix_p13} <= {matrix_p11, matrix_p12, matrix_p13};
{matrix_p21, matrix_p22, matrix_p23} <= {matrix_p21, matrix_p22, matrix_p23};
{matrix_p31, matrix_p32, matrix_p33} <= {matrix_p31, matrix_p32, matrix_p33};
end
end
else begin
{matrix_p11, matrix_p12, matrix_p13} <= 24’h0;
{matrix_p21, matrix_p22, matrix_p23} <= 24’h0;
{matrix_p31, matrix_p32, matrix_p33} <= 24’h0;
end
end
endmodule
module image_gaussian_filter
(
input wire clk,
input wire rst_n,
input wire per_frame_vsync,
input wire per_frame_href,
input wire per_frame_clken,
input wire [7:0] per_img_gray,
output wire post_frame_vsync,
output wire post_frame_href,
output wire post_frame_clken,
output wire [7:0] post_img_gray
);
FPGA-Based Canny Edge Detection
wire matrix_generator_vsync;
wire matrix_generator_href;
wire matrix_generator_clken;
reg matrix_generator_vsync_d1;
reg matrix_generator_href_d1;
reg matrix_generator_clken_d1;
reg matrix_generator_vsync_d2;
reg matrix_generator_href_d2;
reg matrix_generator_clken_d2;
reg [11 : 0] sum_gray1;
reg [11 : 0] sum_gray2;
reg [11 : 0] sum_gray3;
reg [11 : 0] sum_gray;
wire [7 : 0] gray_temp_11;
wire [7 : 0] gray_temp_12;
wire [7 : 0] gray_temp_13;
wire [7 : 0] gray_temp_21;
wire [7 : 0] gray_temp_22;
wire [7 : 0] gray_temp_23;
wire [7 : 0] gray_temp_31;
wire [7 : 0] gray_temp_32;
wire [7 : 0] gray_temp_33;
matrix_generate_3x3 #(
.DATA_WIDTH(8),
.DATA_DEPTH(640)
)u_matrix_generate_3x3(
.clk(clk),
.rst_n(rst_n),
.per_frame_vsync(per_frame_vsync),
.per_frame_href(per_frame_href),
.per_frame_clken(per_frame_clken),
.per_img_y(per_img_gray),
.matrix_frame_vsync(matrix_generator_vsync),
.matrix_frame_href(matrix_generator_href),
.matrix_frame_clken(matrix_generator_clken),
.matrix_p11(gray_temp_11),
42
FPGA-Based Canny Edge Detection
43
.matrix_p12(gray_temp_12),
.matrix_p13(gray_temp_13),
.matrix_p21(gray_temp_21),
.matrix_p22(gray_temp_22),
.matrix_p23(gray_temp_23),
.matrix_p31(gray_temp_31),
.matrix_p32(gray_temp_32),
.matrix_p33(gray_temp_33)
);
always@(posedge clk or negedge rst_n) begin
if(!rst_n)begin
sum_gray1 <= 0;
sum_gray2 <= 0;
sum_gray3 <= 0;
end
else begin
sum_gray1 <= (gray_temp_11) + (gray_temp_12 << 1) + (gray_temp_13);
sum_gray2 <= (gray_temp_21 << 1) + (gray_temp_22 << 2) + (gray_temp_23 << 1);
sum_gray3 <= (gray_temp_31) + (gray_temp_32 << 1) + (gray_temp_33);
end
end
always@(posedge clk or negedge rst_n)begin
if(!rst_n)begin
sum_gray <= 0;
end
else begin
sum_gray <= sum_gray1 + sum_gray2 + sum_gray3;
end
end
always @(posedge clk or negedge rst_n) begin
if(!rst_n)begin
matrix_generator_vsync_d1 <= 0;
matrix_generator_href_d1 <= 0;
matrix_generator_clken_d1 <= 0;
matrix_generator_vsync_d2 <= 0;
matrix_generator_href_d2 <= 0;
matrix_generator_clken_d2 <= 0;
end
else begin
FPGA-Based Canny Edge Detection
matrix_generator_vsync_d1 <= matrix_generator_vsync;
matrix_generator_href_d1 <= matrix_generator_href;
matrix_generator_clken_d1 <= matrix_generator_clken;
matrix_generator_vsync_d2 <= matrix_generator_vsync_d1;
matrix_generator_href_d2 <= matrix_generator_href_d1;
matrix_generator_clken_d2 <= matrix_generator_clken_d1;
end
end
assign post_frame_vsync = matrix_generator_vsync_d2;
assign post_frame_href = matrix_generator_href_d2;
assign post_frame_clken = matrix_generator_clken_d2;
assign post_img_gray = sum_gray >> 4;
endmodule
module one_column_ram#(
parameter DATA_WIDTH = 8,
parameter DATA_DEPTH = 512
)(
input clock,
input clken,
input [DATA_WIDTH - 1 : 0] shiftin,
output [DATA_WIDTH - 1 : 0] taps0x,
output [DATA_WIDTH - 1 : 0] taps1x,
output [DATA_WIDTH - 1 : 0] taps2x
);
wire [DATA_WIDTH - 1 : 0] fifo_rd_data0;
reg [DATA_WIDTH - 1 : 0] fifo_rd_data0_d1;
wire [DATA_WIDTH - 1 : 0] fifo_rd_data1;
reg clken_d1;
reg clken_d2;
reg [DATA_WIDTH - 1 : 0] shiftin_d1;
reg [DATA_WIDTH - 1 : 0] shiftin_d2;
always@(posedge clock)begin
clken_d1 <= clken;
clken_d2 <= clken_d1;
shiftin_d1 <= shiftin;
shiftin_d2 <= shiftin_d1;
fifo_rd_data0_d1 <= fifo_rd_data0;
end
44
FPGA-Based Canny Edge Detection
fifo_ram#(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)
u_fifo_ram0(
.clk(clock),
.wr_en(clken_d2),
.wr_data(shiftin_d2),
.wr_full(),
.rd_en(clken),
.rd_data(fifo_rd_data0),
.rd_empty()
);
fifo_ram#(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)
u_fifo_ram1(
.clk(clock),
.wr_en(clken_d2),
.wr_data(fifo_rd_data0_d1),
.wr_full(),
.rd_en(clken),
.rd_data(fifo_rd_data1),
.rd_empty()
);
assign taps0x = shiftin_d1;
assign taps1x = fifo_rd_data0;
assign taps2x = fifo_rd_data1;
endmodule
module one_column_ram#(
parameter DATA_WIDTH = 8,
parameter DATA_DEPTH = 512
)(
input clock,
input clken,
input [DATA_WIDTH - 1 : 0] shiftin,
output [DATA_WIDTH - 1 : 0] taps0x,
45
FPGA-Based Canny Edge Detection
output [DATA_WIDTH - 1 : 0] taps1x,
output [DATA_WIDTH - 1 : 0] taps2x
);
wire [DATA_WIDTH - 1 : 0] fifo_rd_data0;
reg [DATA_WIDTH - 1 : 0] fifo_rd_data0_d1;
wire [DATA_WIDTH - 1 : 0] fifo_rd_data1;
reg clken_d1;
reg clken_d2;
reg [DATA_WIDTH - 1 : 0] shiftin_d1;
reg [DATA_WIDTH - 1 : 0] shiftin_d2;
always@(posedge clock)begin
clken_d1 <= clken;
clken_d2 <= clken_d1;
shiftin_d1 <= shiftin;
shiftin_d2 <= shiftin_d1;
fifo_rd_data0_d1 <= fifo_rd_data0;
end
fifo_ram#(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)
u_fifo_ram0(
.clk(clock),
.wr_en(clken_d2),
.wr_data(shiftin_d2),
.wr_full(),
.rd_en(clken),
.rd_data(fifo_rd_data0),
.rd_empty()
);
fifo_ram#(
.DATA_WIDTH(DATA_WIDTH),
.DATA_DEPTH(DATA_DEPTH)
)
u_fifo_ram1(
.clk(clock),
.wr_en(clken_d2),
.wr_data(fifo_rd_data0_d1),
46
FPGA-Based Canny Edge Detection
.wr_full(),
.rd_en(clken),
.rd_data(fifo_rd_data1),
.rd_empty()
);
assign taps0x = shiftin_d1;
assign taps1x = fifo_rd_data0;
assign taps2x = fifo_rd_data1;
endmodule
‘timescale 1ns/1ns
module VIP_RGB888_YCbCr444
(
input clk,
input rst_n,
input per_frame_vsync,
input per_frame_href,
input per_frame_clken,
input [7:0] per_img_red,
input [7:0] per_img_green,
input [7:0] per_img_blue,
output post_frame_vsync,
output post_frame_href,
output post_frame_clken,
output [7:0] post_img_Y,
output [7:0] post_img_Cb,
output [7:0] post_img_Cr
);
reg [15:0] img_red_r0, img_red_r1, img_red_r2;
reg [15:0] img_green_r0, img_green_r1, img_green_r2;
reg [15:0] img_blue_r0, img_blue_r1, img_blue_r2;
always@(posedge clk or negedge rst_n)
begin
if(!rst_n)
begin
img_red_r0 <= 0;
img_red_r1 <= 0;
img_red_r2 <= 0;
img_green_r0 <= 0;
img_green_r1 <= 0;
47
FPGA-Based Canny Edge Detection
img_green_r2 <= 0;
img_blue_r0 <= 0;
img_blue_r1 <= 0;
img_blue_r2 <= 0;
end
else
begin
img_red_r0 <= per_img_red * 8’d77;
img_red_r1 <= per_img_red * 8’d43;
img_red_r2 <= per_img_red * 8’d128;
img_green_r0 <= per_img_green * 8’d150;
img_green_r1 <= per_img_green * 8’d85;
img_green_r2 <= per_img_green * 8’d107;
img_blue_r0 <= per_img_blue * 8’d29;
img_blue_r1 <= per_img_blue * 8’d128;
img_blue_r2 <= per_img_blue * 8’d21;
end
end
reg [15:0] img_Y_r0;
reg [15:0] img_Cb_r0;
reg [15:0] img_Cr_r0;
always@(posedge clk or negedge rst_n)
begin
if(!rst_n)
begin
img_Y_r0 <= 0;
img_Cb_r0 <= 0;
img_Cr_r0 <= 0;
end
else
begin
img_Y_r0 <= img_red_r0 + img_green_r0 + img_blue_r0;
img_Cb_r0 <= img_blue_r1 - img_red_r1 - img_green_r1 + 16’d32768;
img_Cr_r0 <= img_red_r2 + img_green_r2 + img_blue_r2 + 16’d32768;
end
end
reg [7:0] img_Y_r1;
reg [7:0] img_Cb_r1;
reg [7:0] img_Cr_r1;
always@(posedge clk or negedge rst_n)
48
FPGA-Based Canny Edge Detection
49
begin
if(!rst_n)
begin
img_Y_r1 <= 0;
img_Cb_r1 <= 0;
img_Cr_r1 <= 0;
end
else
begin
img_Y_r1 <= img_Y_r0[15:8];
img_Cb_r1 <= img_Cb_r0[15:8];
img_Cr_r1 <= img_Cr_r0[15:8];
end
end
reg [2:0] per_frame_vsync_r;
reg [2:0] per_frame_href_r;
reg [2:0] per_frame_clken_r;
always@(posedge clk or negedge rst_n)
begin
if(!rst_n)
begin
per_frame_vsync_r <= 0;
per_frame_href_r <= 0;
per_frame_clken_r <= 0;
end
else
begin
per_frame_vsync_r <= {per_frame_vsync_r[1:0], per_frame_vsync};
per_frame_href_r <= {per_frame_href_r[1:0], per_frame_href};
per_frame_clken_r <= {per_frame_clken_r[1:0], per_frame_clken};
end
end
assign post_frame_vsync = per_frame_vsync_r[2];
assign post_frame_href = per_frame_href_r[2];
assign post_frame_clken = per_frame_clken_r[2];
assign post_img_Y = post_frame_href ? img_Y_r1 : 8’d0;
assign post_img_Cb = post_frame_href ? img_Cb_r1: 8’d0;
assign post_img_Cr = post_frame_href ? img_Cr_r1: 8’d0;
endmodule
module sim_cmos#(
parameter PIC_PATH = "C:/Users/Ahmed/Desktop/new/5.pic/duck_fog.bmp"
FPGA-Based Canny Edge Detection
50
, parameter IMG_HDISP = 640
, parameter IMG_VDISP = 480
)(
input
clk
, input
rst_n
, output
CMOS_VSYNC
, output
CMOS_HREF
, output
CMOS_CLKEN
, output [23:0] CMOS_DATA
, output [10:0] X_POS
, output [10:0] Y_POS
);
integer iBmpFileId;
integer oTxtFileId;
integer iIndex = 0;
integer iCode;
integer iBmpWidth;
integer iBmpHight;
integer iBmpSize;
integer iDataStartIndex;
localparam BMP_SIZE
= 54 + IMG_HDISP * IMG_VDISP * 3 - 1;
reg [ 7:0] rBmpData [0:BMP_SIZE];
integer i,j;
//--------------------------------------------initial begin
iBmpFileId = $fopen(PIC_PATH,"rb");
iCode = $fread(rBmpData,iBmpFileId);
iBmpWidth
= {rBmpData[21],rBmpData[20],rBmpData[19],rBmpData[18]};
FPGA-Based Canny Edge Detection
51
iBmpHight
= {rBmpData[25],rBmpData[24],rBmpData[23],rBmpData[22]};
iBmpSize
= {rBmpData[ 5],rBmpData[ 4],rBmpData[ 3],rBmpData[ 2]};
iDataStartIndex = {rBmpData[13],rBmpData[12],rBmpData[11],rBmpData[10]};
$fclose(iBmpFileId);
end
wire cmos_vsync ;
reg
cmos_href;
wire
cmos_clken;
reg [23:0] cmos_data;
reg
cmos_clken_r;
reg [31:0]
cmos_index;
localparam H_SYNC = 11’d10;
localparam H_BACK = 11’d10;
localparam H_DISP = IMG_HDISP;
localparam H_FRONT = 11’d10;
localparam H_TOTAL = H_SYNC + H_BACK + H_DISP + H_FRONT;
localparam V_SYNC = 11’d10;
localparam V_BACK = 11’d10;
localparam V_DISP = IMG_VDISP;
localparam V_FRONT = 11’d10;
localparam V_TOTAL = V_SYNC + V_BACK + V_DISP + V_FRONT;
//--------------------------------------------always@(posedge clk or negedge rst_n) begin
if(!rst_n)
cmos_clken_r <= 0;
else
cmos_clken_r <= ~cmos_clken_r;
end
reg [10:0] hcnt;
always@(posedge clk or negedge rst_n) begin
FPGA-Based Canny Edge Detection
52
if(!rst_n)
hcnt <= 11’d0;
else if(cmos_clken_r)
hcnt <= (hcnt < H_TOTAL - 1’b1) ? hcnt + 1’b1 : 11’d0;
end
reg [10:0] vcnt;
always@(posedge clk or negedge rst_n) begin
if(!rst_n)
vcnt <= 11’d0;
else if(cmos_clken_r) begin
if(hcnt == H_TOTAL - 1’b1)
vcnt <= (vcnt < V_TOTAL - 1’b1) ? vcnt + 1’b1 : 11’d0;
else
vcnt <= vcnt;
end
end
reg cmos_vsync_r;
always@(posedge clk or negedge rst_n) begin
if(!rst_n)
cmos_vsync_r <= 1’b0;
else begin
if(vcnt <= V_SYNC - 1’b1)
cmos_vsync_r <= 1’b0;
else
cmos_vsync_r <= 1’b1;
end
end
assign cmos_vsync = cmos_vsync_r;
wire frame_valid_ahead =
( vcnt >= V_SYNC + V_BACK && vcnt < V_SYNC + V_BACK + V_
&& hcnt >= H_SYNC + H_BACK && hcnt < H_SYNC + H_BACK +
? 1’b1 : 1’b0;
reg
cmos_href_r;
always@(posedge clk or negedge rst_n) begin
if(!rst_n)
cmos_href_r <= 0;
else begin
if(frame_valid_ahead)
FPGA-Based Canny Edge Detection
53
cmos_href_r <= 1;
else
cmos_href_r <= 0;
end
end
always@(posedge clk or negedge rst_n) begin
if(!rst_n)
cmos_href <= 0;
else
cmos_href <= cmos_href_r;
end
assign cmos_clken = cmos_href & cmos_clken_r;
wire [10:0] x_pos;
wire [10:0] y_pos;
assign x_pos = frame_valid_ahead ? (hcnt - (H_SYNC + H_BACK )) : 0;
assign y_pos = frame_valid_ahead ? (vcnt - (V_SYNC + V_BACK )) : 0;
always@(posedge clk or negedge rst_n)begin
if(!rst_n) begin
cmos_index
<= 0;
cmos_data
<= 24’d0;
end
else begin
cmos_index
<= y_pos * IMG_HDISP * 3 + x_pos * 3 + 54;
cmos_data
<= {rBmpData[cmos_index], rBmpData[cmos_index+1] , rBmpData[cm
end
end
reg [10:0] x_pos_d [0 : 10];
reg [10:0] y_pos_d [0 : 10];
always@(posedge clk or negedge rst_n)begin
if(!rst_n)begin
for(i = 0; i < 11; i = i + 1)begin
x_pos_d[i] <= 0;
y_pos_d[i] <= 0;
end
FPGA-Based Canny Edge Detection
54
end
else begin
x_pos_d[0] <= x_pos;
y_pos_d[0] <= y_pos;
for(i = 1; i < 11; i = i + 1)begin
x_pos_d[i] <= x_pos_d[i-1];
y_pos_d[i] <= y_pos_d[i-1];
end
end
end
assign
assign
assign
assign
assign
assign
CMOS_VSYNC = cmos_vsync;
CMOS_HREF = cmos_href;
CMOS_CLKEN = cmos_clken;
CMOS_DATA = cmos_data;
X_POS
= x_pos;
Y_POS
= y_pos;
endmodule
‘timescale 1ns / 1ps
module canny_tb();
‘define Vivado_Sim
‘ifdef Vivado_Sim
localparam PIC_INPUT_PATH
localparam PIC_OUTPUT_PATH
‘endif
localparam PIC_WIDTH
localparam PIC_HEIGHT
reg
reg
=
=
= 640
= 480
"C:/Users/Ahmed/Desktop/new/5.pic/monkey.bmp" ;
"C:/Users/Ahmed/Desktop/new/5.pic/outcome_new.bmp" ;
;
;
cmos_clk
= 0;
cmos_rst_n = 0;
wire
cmos_vsync
wire
cmos_href
wire
cmos_clken
wire [23:0] cmos_data
parameter cmos0_period = 6;
;
;
;
;
FPGA-Based Canny Edge Detection
55
always#(cmos0_period/2) cmos_clk = ~cmos_clk;
initial #(20*cmos0_period) cmos_rst_n = 1;
//-------------------------------------------------//Camera Simulation
sim_cmos #(
.PIC_PATH (PIC_INPUT_PATH
)
, .IMG_HDISP
(PIC_WIDTH
)
, .IMG_VDISP
(PIC_HEIGHT
)
)u_sim_cmos0(
.clk
(cmos_clk
)
,
.rst_n
(cmos_rst_n
)
,
.CMOS_VSYNC
(cmos_vsync
)
,
.CMOS_HREF
(cmos_href
)
,
.CMOS_CLKEN
(cmos_clken
)
,
.CMOS_DATA
(cmos_data
)
,
.X_POS
()
,
.Y_POS
()
);
//-------------------------------------------------//Image Processing
wire
wire
wire
wire
wire
wire
post0_vsync
;
post0_href
;
post0_clken
;
[7:0] post0_img_Y
[7:0] post0_img_Cb
[7:0] post0_img_Cr
;
;
;
wire
wire
wire
wire
gauss_vsync
gauss_hsync
gauss_de
[7:0] img_gauss
;
;
;
wire
wire
wire
wire
canny_vsync
canny_hsync
canny_de
img_canny
;
;
;
;
;
//RGB888 to YCbCr444
VIP_RGB888_YCbCr444 u_VIP_RGB888_YCbCr444
FPGA-Based Canny Edge Detection
56
(
.clk
, .rst_n
(cmos_clk
(cmos_rst_n
)
)
, .per_frame_vsync (cmos_vsync
)
, .per_frame_href (cmos_href
)
, .per_frame_clken (cmos_clken
)
, .per_img_red (cmos_data[16+:8] )
, .per_img_green (cmos_data[ 8+:8] )
, .per_img_blue (cmos_data[ 0+:8] )
, .post_frame_vsync (post0_vsync )
, .post_frame_href (post0_href
)
, .post_frame_clken (post0_clken )
, .post_img_Y
(post0_img_Y
)
, .post_img_Cb (post0_img_Cb )
, .post_img_Cr (post0_img_Cr )
);
//Gaussian Filter
image_gaussian_filter u_image_gaussian_filter
(
.clk
(cmos_clk
)
, .rst_n
(cmos_rst_n
)
, .per_frame_vsync
, .per_frame_href
, .per_frame_clken
, .per_img_gray
(post0_vsync
(post0_href
(post0_clken
(post0_img_Y
, .post_frame_vsync
, .post_frame_href
, .post_frame_clken
, .post_img_gray
);
(gauss_vsync
(gauss_hsync
(gauss_de
(img_gauss
)
)
)
)
)
)
)
)
//Canny Edge Detection
canny_edge_detect_top u_canny_edge_detect_top(
.clk
(cmos_clk
)
, .rst_n
(cmos_rst_n
)
, .per_frame_vsync
(gauss_vsync
)
FPGA-Based Canny Edge Detection
57
, .per_frame_href
, .per_frame_clken
, .per_img_y
(gauss_hsync
(gauss_de
(img_gauss
)
)
)
, .post_frame_vsync
, .post_frame_href
, .post_frame_clken
, .post_img_bit
(canny_vsync
(canny_hsync
(canny_de
(img_canny
)
)
)
)
);
//-------------------------------------------------//Video saving
video_to_pic #(
.PIC_PATH
(PIC_OUTPUT_PATH )
,
.START_FRAME
(1
)
, .IMG_HDISP
(PIC_WIDTH
)
, .IMG_VDISP
(PIC_HEIGHT
)
)u_video_to_pic0(
.clk
(cmos_clk
)
,
.rst_n
(cmos_rst_n
)
,
.video_vsync
(canny_vsync
)
,
.video_hsync
(canny_hsync
)
,
.video_de
(canny_de
)
,
.video_data
({24{img_canny}} )
);
endmodule
‘timescale 1ns / 1ns
module video_to_pic#(
parameter PIC_PATH = "C:/Users/Ahmed/Desktop/new/5.pic/outcome_new.bmp"
,
parameter START_FRAME
= 1
, parameter IMG_HDISP
= 640
, parameter IMG_VDISP
= 480
)(
input
clk
FPGA-Based Canny Edge Detection
,
,
,
,
,
input
input
input
input
input
[23:0]
58
rst_n
video_vsync
video_hsync
video_de
video_data
);
integer iCode;
integer iBmpFileId;
integer iBmpWidth;
integer iBmpHight;
integer iBmpSize;
integer iDataStartIndex;
integer iIndex = 0;
localparam BMP_SIZE
= 54 + IMG_HDISP * IMG_VDISP * 3 - 1;
reg [ 7:0] BmpHead
[0:53];
reg [ 7:0] Vip_BmpData
[0:BMP_SIZE];
reg [ 7:0] vip_pixel_data
[0:BMP_SIZE-54];
reg [31:0] rBmpWord;
reg
reg
reg
wire
wire
wire
assign
assign
assign
[11:0]
[31:0]
[7:0]
[7:0]
[7:0]
video_vsync_d1 = 0;
frame_cnt = 0;
PIC_cnt = 0;
PIC_img_R;
PIC_img_G;
PIC_img_B;
PIC_img_R = video_data[16+:8];
PIC_img_G = video_data[ 8+:8];
PIC_img_B = video_data[ 0+:8];
always@(posedge clk or negedge rst_n)begin
if(!rst_n)begin
video_vsync_d1
<= 0;
end
else begin
video_vsync_d1 <= video_vsync;
end
end
FPGA-Based Canny Edge Detection
59
always@(posedge clk or negedge rst_n)begin
if(!rst_n)begin
frame_cnt
<= 0;
end
else if(video_vsync_d1 & !video_vsync)begin
frame_cnt
<= frame_cnt + 1;
end
end
always@(posedge clk or negedge rst_n)begin
if(!rst_n) begin
PIC_cnt <= 32’d0;
end
else if(video_de) begin
if(frame_cnt == START_FRAME - 1) begin
PIC_cnt
<= PIC_cnt + 3;
vip_pixel_data[PIC_cnt+0]
<= PIC_img_R;
vip_pixel_data[PIC_cnt+1]
<= PIC_img_G;
vip_pixel_data[PIC_cnt+2]
<= PIC_img_B;
end
end
end
initial begin
for(iIndex = 0; iIndex < 54; iIndex = iIndex + 1) begin
BmpHead[iIndex] = 0;
end
#2
{BmpHead[1],BmpHead[0]} = {8’h4D,8’h42};
{BmpHead[5],BmpHead[4],BmpHead[3],BmpHead[2]} = BMP_SIZE + 1;//File Size (Bytes
BmpHead[10] = 8’d54;//Bitmap Data Offset
BmpHead[14] = 8’h28;//Bitmap Header Size
{BmpHead[21],BmpHead[20],BmpHead[19],BmpHead[18]} = IMG_HDISP;//Width
{BmpHead[25],BmpHead[24],BmpHead[23],BmpHead[22]} = IMG_VDISP;//Height
BmpHead[26] = 8’d1;//Number of Color Planes
BmpHead[28] = 8’d24;//Bits per Pixel
end
FPGA-Based Canny Edge Detection
60
wait(frame_cnt == START_FRAME);
iBmpFileId = $fopen(PIC_PATH,"wb+");
for (iIndex = 0; iIndex < BMP_SIZE + 1; iIndex = iIndex + 1) begin
if(iIndex < 54) begin
Vip_BmpData[iIndex] = BmpHead[iIndex];
end
else begin
Vip_BmpData[iIndex] = vip_pixel_data[iIndex-54];
end
end
for (iIndex = 0; iIndex < BMP_SIZE + 1; iIndex = iIndex + 1) begin
$fwrite(iBmpFileId,"%c",Vip_BmpData[iIndex]);
end
$fclose(iBmpFileId);
end
endmodule
FPGA-Based Canny Edge Detection
61
References
[1] J. Canny, ”A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp.
679-698, 1986.
[2] S. A. Jondhale and P. P. Sarang, ”FPGA-Based Efficient Implementation of
Canny Edge Detector,” International Journal of Computer Applications, vol.
55, no. 7, pp. 36-40, 2012.
[3] M. H. Nguyen, D. T. Hoang, and J. S. Lee, ”An FPGA-Based Hardware Accelerator for Real-Time Canny Edge Detection,” in Proceedings of the IEEE
International Conference on Electronics, Circuits and Systems (ICECS),
2019, pp. 45-48.
[4] H. Farhidzadeh and M. Ahmadi, ”Real-Time Edge Detection on FPGA Using a Parallel Canny Algorithm,” in Proceedings of the IEEE International
Symposium on Circuits and Systems (ISCAS), 2017, pp. 1582-1585.
[5] D. G. Bailey, Design for Embedded Image Processing on FPGAs, John Wiley
Sons, 2011.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )