Uploaded by pooja pawar

LP5-HPC (2) merged (1)

advertisement
STES’s
RMD Sinhgad School of Engineering,
Warje-58
410255: Laboratory Practice V
High Performance Computing (410250)
Department of Computer Engineering
BE (2019 Course)
SINHGAD TECHNICAL EDUCATION SOCIETY’S
RMD SINHGAD SCHOOL OF ENGINEERING
Warje, Pune 411058
Department of Computer Engineering
LABORATORY MANUAL
AY: 2022-23
High Performance Computing
BE Computer Engineering
Semester –II
Subject Code - 410250
TEACHING SCHEME
Practical: 2 Hrs / Week
CREDIT
01
EXAMINATION
TW: 50 Marks
PR:50 Marks
Prepared By:
Mrs. Pradnya Kasture &Mrs. Shafali Gupta
(Assistant Professor, Department of Computer Engineering)
Course Objectives
 To understand and implement searching and sorting algorithms
 To learn the fundamentals of GPU Computing in the CUDA environment
 To illustrate the concepts of Artificial Intelligence/Machine Learning (AI/ML)
 To understand Hardware acceleration
 To implement different deep learning models
2
Course Outcomes
CO1: Analyze and measure performance of sequential and parallel algorithms.
CO2: Design and Implement solutions for multicore/Distributed/parallel environment.
CO3: Identify and apply the suitable algorithms to solve AI/ML problems.
CO4: Apply the technique of Deep Neural network for implementing Linear regression and
classification.
CO5: Apply the technique of Convolution (CNN) for implementing Deep Learning models.
CO6: Design and develop Recurrent Neural Network (RNN) for prediction.
3
Guidelines for Laboratory Conduction
● List of recommended programming assignments and sample mini-projects is provided for reference.
● Referring these, Course Teacher or Lab Instructor may frame the assignments/mini-project by understanding
the prerequisites, technological aspects, utility and recent trends related to the respective courses.
● Preferably there should be multiple sets of assignments/mini-project and distribute among batches of
students.
● Real world problems/application based assignments/mini-projects create interest among learners serving as
foundation for future research or startup of business projects.
● Mini-project can be completed in group of 2 to 3 students.
● Software Engineering approach with proper documentation is to be strictly followed.
● Use of open source software is to be encouraged.
● Instructor may also set one assignment or mini-project that is suitable to respective course beyond the scope
of syllabus.
• Operating System recommended :- 64-bit Open source Linux or its derivative
• Programming Languages: Object Oriented Languages C++/JAVA/PYTHON/R
• Programming tools recommended: Front End: Java/Perl/PHP/Python/Ruby/.net,
• Backend : MongoDB/MYSQL/Oracle
• Database Connectivity : ODBC/JDBC
4
5
Group 1
Assignment 1 :
Design and implement Parallel Breadth First Search and Depth
First Search based on existing algorithms using OpenMP. Use a
Tree or an undirected graph for BFS and DFS .
6
Parallel Breadth First Search
1.
To design and implement parallel breadth first search, you will need to
divide the graph into smaller sub-graphs and assign each sub-graph to a
different processor or thread.
2. Each processor or thread will then perform a breadth first search on its
assigned sub-graph concurrently with the other processors or threads.
3. Two methods : Vertex by Vertex OR Level By Level
7
Breadth First Search
To design and implement parallel breadth first search using OpenMP, you can
use the existing breadth first search algorithm and parallelize it using
OpenMP's palatalization constructs.
9
Program Sample :
#include <iostream>
#include <vector>
#include <queue>
#include <omp.h>
using namespace std;
void bfs(vector<vector<int>>& graph, int start, vector<bool>&
visited) {
queue<int> q;
q.push(start);
visited[start] = true;
#pragma omp parallel
{
#pragma omp single
{
while (!q.empty()) {
int vertex = q.front();
q.pop();
10
#pragma omp task firstprivate(vertex)
{
for (int neighbor : graph[vertex]) {
if (!visited[neighbor]) {
q.push(neighbor);
visited[neighbor] = true;
#pragma omp task
bfs(graph, neighbor, visited);
}
}
}
}
}
}
}
void parallel_bfs(vector<vector<int>>& graph, int start) {
vector<bool> visited(graph.size(), false);
bfs(graph, start, visited);
}
11
1. In this implementation, the parallel_bfs function takes in a graph represented as an adjacency list, where each element
in the list is a vector of neighboring vertices, and a starting vertex.
2. The bfs function uses a queue to keep track of the vertices to visit, and a boolean visited array to keep track of which
vertices have been visited. The #pragma omp parallel directive creates a parallel region and the #pragma omp single
directive creates a single execution context within that region.
3. Inside the while loop, the #pragma omp task directive creates a new task for each unvisited neighbor of the current
vertex.
4. This allows each task to be executed in parallel with other tasks. The firstprivate clause is used to ensure that each task
has its own copy of the vertex variable.
5. This is just one possible implementation, and there are many ways to improve it depending on the specific
requirements of your application. For example, you can use omp atomic or omp critical to protect the shared resource
queue.
6. This is just one possible implementation, and there are many ways to improve it depending on the specific
requirements of your application. For example, you can use omp atomic or omp critical to protect the shared resource
queue.
12
Parallel Depth First Search
13
Parallel Depth-First Search
• Different subtrees can be searched concurrently.
• Subtrees can be very different in size.
• Estimate the size of a subtree rooted at a node.
• Dynamic load balancing is required.
Parallel Depth-First Search: Dynamic Load Balancing
• When a processor runs out of work, it gets more work from another processor.
• This is done using work requests and responses in message passing machines
and locking and extracting work in shared address space machines.
• On reaching final state at a processor, all processors terminate.
• Unexplored states can be conveniently stored as local stacks at processors.
• The entire space is assigned to one processor to begin with.
Parameters in Parallel DFS: Work Splitting
• Work is split by splitting the stack into two.
• Ideally, we do not want either of the split pieces to be small.
• Select nodes near the bottom of the stack (node splitting), or
• Select some nodes from each level (stack splitting).
• The second strategy generally yields a more even split of the space.
Load-Balancing Schemes
• Asynchronous round robin: Each processor maintains a counter and makes requests in a roundrobin fashion.
• Global round robin: The system maintains a global counter and requests are made in a roundrobin fashion, globally.
• Random polling: Request a randomly selected processor for work.
Analyzing DFS
• We can’t compute, analytically, the serial work W or parallel time. Instead, we
quantify total overhead To in terms of W to compute scalability.
• For dynamic load balancing, idling time is subsumed by communication.
• We must quantify the total number of requests in the system.
Load-Balancing Schemes
• Asynchronous round robin has poor performance because it makes a
large number of work requests.
• Global round robin has poor performance because of contention at
counter, although it makes the least number of requests.
• Random polling strikes a desirable compromise.
Termination Detection
• Processor P0 has all the work and a weight of one is associated with it. When its work is
partitioned and sent to another processor, processor P0 retains half of the weight and
gives half of it to the processor receiving the work.
• If Pi is the recipient processor and wi is the weight at processor Pi, then after the first
work transfer, both w0 and wi are 0.5.
• Each time the work at a processor is partitioned, the weight is halved. When a processor
completes its computation, it returns its weight to the processor from which it received
work.
• Termination is signaled when the weight w0 at processor P0 becomes one and processor
P0 has finished its work
Program
#include <iostream>
#include <vector>
#include <stack>
#include <omp.h>
using namespace std;
void dfs(vector<vector<int>>& graph, int start,
vector<bool>& visited) {
stack<int> s;
s.push(start);
visited[start] = true;
#pragma omp parallel
{
#pragma omp single
{
while (!s.empty()) {
int vertex = s.top();
s.pop();
21
#pragma omp task firstprivate(vertex)
{
for (int neighbor : graph[vertex]) {
if (!visited[neighbor]) {
s.push(neighbor);
visited[neighbor] = true;
#pragma omp task
dfs(graph, neighbor, visited);
}
}
}
}
}
}
void parallel_dfs(vector<vector<int>>& graph, int start) {
vector<bool> visited(graph.size(), false);
dfs(graph, start, visited);
}
22
1. In this implementation, the parallel_dfs function takes in a graph represented as an adjacency list, where each
element in the list is a vector of neighboring vertices, and a starting vertex.
2. The dfs function uses a stack to keep track of the vertices to visit, and a boolean visited array to keep track of
which vertices have been visited.
3. The #pragma omp parallel directive creates a parallel region and the #pragma omp single directive creates
a single execution context within that region.
4. Inside the while loop, the #pragma omp task directive creates a new task for each unvisited neighbor of the
current vertex.
5. This allows each task to be executed in parallel with other tasks. The firstprivate clause is used to ensure that
each task has its own copy of the vertex variable.
6. This implementation is suitable for both tree and undirected graph, since both are represented as an adjacency list
and the algorithm is using a stack to traverse the graph.
7. This is just one possible implementation, and there are many ways to improve it depending on the specific
requirements of your application. For example, you can use omp atomic or omp critical to protect the shared
resource stack.
8. The dfs function uses a stack to keep track of the vertices to visit, and a boolean visited array to keep track
of which vertices have been visited. The #pragma omp parallel directive creates a parallel region and
the #pragma omp single directive creates a single execution context within that region.
23
Assignment 2 :
Write a program to implement Parallel
Bubble Sort and Merge sort using
OpenMP. Use existing algorithms and
measure the performance of sequential and
parallel algorithms.
24
Bubble sort
1. The complexity of bubble sort is Θ(n2).
2. Bubble sort is difficult to parallelize since the algorithm has no
concurrency.
3. A simple variant, though, uncovers the concurrency.
25
Odd-Even Transposition
Sorting n = 8 elements, using the odd-even transposition sort algorithm. During each phase, n = 8 elements are compared.
Odd-Even Transposition
Sequential odd-even transposition sort algorithm.
Odd-Even Transposition
1.
After n phases of odd-even exchanges, the sequence is sorted.
2.
Each phase of the algorithm (either odd or even) requires Θ(n) comparisons.
3.
Serial complexity is Θ(n2).
Parallel Odd-Even Transposition
1.
Consider the one item per processor case.
2.
There are n iterations, in each iteration, each processor does one
compare-exchange.
3.
The parallel run time of this formulation is Θ(n).
4.
This is cost optimal with respect to the base serial algorithm but not
the optimal one.
Parallel Odd-Even Transposition
Parallel formulation of odd-even transposition.
Parallel Odd-Even Transposition
1.
Consider a block of n/p elements per processor.
2.
The first step is a local sort.
3.
In each subsequent step, the compare exchange operation is
replaced by the compare split operation.
4.
The parallel run time of the formulation is
Bubble Sort Odd Even Transposition
32
#include <iostream>
#include <vector>
#include <omp.h>
using namespace std;
void bubble_sort_odd_even(vector<int>& arr) {
bool isSorted = false;
while (!isSorted) {
isSorted = true;
#pragma omp parallel for
for (int i = 0; i < arr.size() - 1; i += 2) {
if (arr[i] > arr[i + 1]) {
swap(arr[i], arr[i + 1]);
isSorted = false;
}
}
#pragma omp parallel for
for (int i = 1; i < arr.size() - 1; i += 2) {
if (arr[i] > arr[i + 1]) {
swap(arr[i], arr[i + 1]);
isSorted = false;
}
}
}
}
33
int main() {
vector<int> arr = {5, 2, 9, 1, 7, 6, 8, 3, 4};
double start, end;
// Measure performance of parallel bubble sort using oddeven transposition
start = omp_get_wtime();
bubble_sort_odd_even(arr);
end = omp_get_wtime();
cout << "Parallel bubble sort using odd-even transposition
time: " << end - start << endl;
}
34
1.
2.
3.
4.
5.
6.
7.
8.
9.
This program uses OpenMP to parallelize the bubble sort algorithm.
The #pragma omp parallel for directive tells the compiler to create a team of threads to execute the for loop within the
block in parallel.
Each thread will work on a different iteration of the loop, in this case on comparing and swapping the elements of the
array.
The bubbleSort function takes in an array, and it sorts it using the bubble sort algorithm. The outer loop iterates from 0 to
n-2 and the inner loop iterates from 0 to n-i-1, where i is the index of the outer loop. The inner loop compares the current
element with the next element, and if the current element is greater than the next element, they are swapped.
The main function creates a sample array and calls the bubbleSort function to sort it. The sorted array is then printed.
This is a skeleton code and it may not run as is and may need some modification to work with specific inputs and
requirements.
It is worth noting that bubble sort is not an efficient sorting algorithm, specially for large inputs, and it may not scale well
with more number of threads. Also parallelizing bubble sort does not have a significant improvement in performance due
to the nature of the algorithm itself.
In this implementation, the bubble_sort_odd_even function takes in an array and sorts it using the odd-even transposition
algorithm. The outer while loop continues until the array is sorted. Inside the loop, the #pragma omp parallel for
directive creates a parallel region and divides the loop iterations among the available threads. Each thread performs the
swap operation in parallel, improving the performance of the algorithm.
The two #pragma omp parallel for inside while loop, one for even indexes and one for odd indexes, allows each thread
to sort the even and odd indexed elements simultaneously and prevent the dependency.
35
Merge Sort
36
Parallel Merge Sort
Given a set of elements A = {a .. ~,~, ... ,an},
Aodd and Aeven are defined as the set of elements of A with odd and even indices, respectively.
For example, Aodd = {a .. a3,a" ... } and Aeven = { ~,a4,a6" .. } regarding a set of elements A = {a…an}'
Similarly, let a set of elements B = {b ….bn}. We can then define the merge operation as:
Merge(A,B) = {a.,b.,~,b2,a3,b3, ... ,an,bn}
For example,
if A = {1,2,3,4} and B = {5,6,7,8}
then Merge( (1,2,3,4 },{5,6,7,8}) = {1,5,2,6,3,7,4,8}
Join(A,B) = (Merge(A,B), Odd-Even(A,B)}
Example: Suppose the set of elements S =
{2,3,6,1O,15,4,5,8} and we start with A =
{2,6,IO,15} and B = {3,4,5,8}, two sorted sets
of elements. Then
#include <iostream>
#include <vector>
#include <omp.h>
using namespace std;
void merge(vector<int>& arr, int l, int m, int r) {
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
vector<int> L(n1), R(n2);
for (i = 0; i < n1; i++) {
L[i] = arr[l + i];
}
for (j = 0; j < n2; j++) {
R[j] = arr[m + 1 + j];
}
i = 0;
j = 0;
k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k++] = L[i++];
} else {
arr[k++] = R[j++];
}
}
40
void merge_sort(vector<int>& arr, int l, int r) {
if (l < r) {
int m = l + (r - l) / 2;
#pragma omp task
merge_sort(arr, l, m);
#pragma omp task
merge_sort(arr, m + 1, r);
merge(arr, l, m, r);
}
}
void parallel_merge_sort(vector<int>& arr) {
#pragma omp parallel
{
#pragma omp single
merge_sort(arr, 0, arr.size() - 1);
}
}
41
int main() {
vector<int> arr = {5, 2, 9, 1, 7, 6, 8, 3, 4};
double start, end;
// Measure performance of sequential merge sort
start = omp_get_wtime();
merge_sort(arr, 0, arr.size() - 1);
end = omp_get_wtime();
cout << "Sequential merge sort time: " << end - start <<
endl;
// Measure performance of parallel merge sort
arr = {5, 2, 9, 1, 7, 6, 8, 3, 4};
start = omp_get_wtime();
parallel_merge_sort(arr);
end = omp_get
42
Assignment 3 : Implement Min, Max, Sum
and Average operations using Parallel
Reduction
43
#include <iostream>
#include <vector>
#include <omp.h>
using namespace std;
void min_reduction(vector<int>& arr) {
int min_value = INT_MAX;
#pragma omp parallel for reduction(min: min_value)
for (int i = 0; i < arr.size(); i++) {
if (arr[i] < min_value) {
min_value = arr[i];
}
}
cout << "Minimum value: " << min_value << endl;
}
void max_reduction(vector<int>& arr) {
int max_value = INT_MIN;
#pragma omp parallel for reduction(max: max_value)
for (int i = 0; i < arr.size(); i++) {
if (arr[i] > max_value) {
max_value = arr[i];
}
}
cout << "Maximum value: " << max_value << endl;
}
44
void sum_reduction(vector<int>& arr) {
int sum = 0;
#pragma omp parallel for reduction(+: sum)
for (int i = 0; i < arr.size(); i++) {
sum += arr[i];
}
cout << "Sum: " << sum << endl;
}
void average_reduction(vector<int>& arr) {
int sum = 0;
#pragma omp parallel for reduction(+: sum)
for (int i = 0; i < arr.size(); i++) {
sum += arr[i];
}
cout << "Average: " << (double)sum / arr.size() << endl;
}
int main() {
vector<int> arr = {5, 2, 9, 1, 7, 6, 8, 3, 4};
min_reduction(arr);
max_reduction(arr);
sum_reduction(arr);
average_reduction(arr);
}
45
1.
The min_reduction function finds the minimum value in the input array using the #pragma omp parallel for
reduction(min: min_value) directive, which creates a parallel region and divides the loop iterations among the
available threads. Each thread performs the comparison operation in parallel and updates the min_value
variable if a smaller value is found.
1.
Similarly, the max_reduction function finds the maximum value in the array, sum_reduction function finds the
sum of the elements of array and average_reduction function finds the average of the elements of array by
dividing the sum by the size of the array.
1.
The reduction clause is used to combine the results of multiple threads into a single value, which is then
returned by the function. The min and max operators are used for the min_reduction and max_reduction
functions, respectively, and the + operator is used for the sum_reduction and average_reduction functions. In
the main function, it creates a vector and calls the functions min_reduction, max_reduction, sum_reduction, and
average_reduction to compute the values of min, max, sum and average respectively.
46
Assignment 4 : Write a CUDA Program for :
1. Addition of two large vectors
2. Matrix Multiplication using CUDA C
47
Vector Addition
48
#include <iostream>
#include <cuda_runtime.h>
__global__ void addVectors(int* A, int* B, int* C, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) {
C[i] = A[i] + B[i];
}
}
int main() {
int n = 1000000;
int* A, * B, * C;
int size = n * sizeof(int);
// Allocate memory on the host
cudaMallocHost(&A, size);
cudaMallocHost(&B, size);
cudaMallocHost(&C, size);
// Initialize the vectors
for (int i = 0; i < n; i++) {
A[i] = i;
B[i] = i * 2;
}
49
// Allocate memory on the device
int* dev_A, * dev_B, * dev_C;
cudaMalloc(&dev_A, size);
cudaMalloc(&dev_B, size);
cudaMalloc(&dev_C, size);
// Copy data from host to device
cudaMemcpy(dev_A, A, size,
cudaMemcpyHostToDevice);
cudaMemcpy(dev_B, B, size,
cudaMemcpyHostToDevice);
// Launch the kernel
int blockSize = 256;
int numBlocks = (n + blockSize - 1) / blockSize;
50
// Copy data from device to host
cudaMemcpy(C, dev_C, size, cudaMemcpyDeviceToHost);
// Print the results
for (int i = 0; i < 10; i++) {
cout << C[i] << " ";
}
cout << endl;
// Free memory
cudaFree(dev_A);
cudaFree(dev_B);
cudaFree(dev_C);
cudaFreeHost(A);
cudaFreeHost(B);
cudaFreeHost(C);
return 0;
51
1.
In this program, the `addVectors` kernel takes in the two input vectors `A` and `B`, the output vector `C`, and
the size of the vectors `n`. The kernel uses the `blockIdx.x` and `threadIdx.x` variables to calculate the index `i`
of the current thread. If the index is less than `n`, the kernel performs the addition operation `C[i] = A[i] +
B[i]`.
1.
In the `main` function, the program first allocates memory for the input and output vectors on the host and
initializes them. Then it allocates memory for the vectors on the device and copies the data from the host to the
device using `cudaMemcpy`.
52
Matrix multiplication
53
Matrix-Matrix Multiplication
●
Consider two n x n matrices A and B partitioned into p blocks Ai,j and Bi,j
≤ i, j < ) of size each.
●
Process Pi,j initially stores Ai,j and Bi,j and computes block Ci,j of the result
matrix.
●
Computing submatrix Ci,j requires all submatrices Ai,k and Bk,j for 0 ≤ k <
●
All-to-all broadcast blocks of A along rows and B along columns.
●
Perform local submatrix multiplication.
(0
.
Example
A =
2
1
5
3
6
1
2
3
0
7
1
6
4
5
6
5
9
2
4
4
1
9
8
-8
3
6
7
2
4
0
-8
5
B =
55
Example
A =
2
1
5
3
6
1
2
3
0
7
1
6
4
5
6
5
9
2
4
4
1
9
8
-8
3
6
7
2
4
0
-8
5
B =
56
Metrics divided into 4 squares
2
1
5
3
6
1
2
3
0
7
1
6
4
5
6
5
9
5
4
4
1
9
8
-8
2
3
7
2
4
0
-8
5
57
Metrics alignment
2
1
5
3
6
1
2
3
0
7
1
6
4
5
6
5
9
5
4
4
1
9
8
-8
2
3
7
2
4
0
-8
5
58
C0,0
C0,1
C1,0
C1,1
=
2
1
0
7
5
3
1
6
1
4
5
8
-8
6
-8
5
4
4
1
9
7
2
4
0
9
2
2
3
5
3
6
5
=
X
=
=
X
X
X
=
=
=
=
16
7
28
35
16
-25
-40
22
20
36
15
63
30
37
42
39
59
Now next step shifting operation is performed on
A and B , Shift A one step left and B one step Up
5
3
2
1
1
9
2
3
1
6
0
7
4
0
6
5
4
4
4
4
6
1
8
-8
7
2
7
2
4
5
-8
5
60
After performing local matrix
multiplication
C0,0 =
C0,1 =
C1,0
C1,1
=
16
7
28
35
16
-25
-40
22
20
36
15
63
30
37
42
39
=
X
17
45
25
9
10
11
42
35
62
19
42
33
0
-12
40
-46
X
X
X
=
=
=
=
33
52
53
44
26
-14
2
57
82
55
57
96
30
25
82
-7
61
#include <cuda_runtime.h>
#include <iostream>
__global__ void matmul(int* A, int* B, int* C, int N) {
int Row = blockIdx.y*blockDim.y+threadIdx.y;
int Col = blockIdx.x*blockDim.x+threadIdx.x;
if (Row < N && Col < N) {
int Pvalue = 0;
for (int k = 0; k < N; k++) {
Pvalue += A[Row*N+k] * B[k*N+Col];
}
C[Row*N+Col] = Pvalue;
}
}
int main() {
int N = 512;
int size = N * N * sizeof(int);
int* A, * B, * C;
int* dev_A, * dev_B, * dev_C;
cudaMallocHost(&A, size);
cudaMallocHost(&B, size);
cudaMallocHost(&C, size);
62
cudaMalloc(&dev_A, size);
cudaMalloc(&dev_B, size);
cudaMalloc(&dev_C, size);
// Initialize matrices A and B
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
A[i*N+j] = i*N+j;
B[i*N+j] = j*N+i;
}
}
cudaMemcpy(dev_A, A, size,
cudaMemcpyHostToDevice);
cudaMemcpy(dev_B, B, size,
cudaMemcpyHostToDevice);
dim3 dimBlock(16, 16);
dim3 dimGrid(N/dimBlock.x, N/dimBlock.y);
matmul<<<dimGrid, dimBlock>>>(dev_A, dev_B,
dev_C, N);
cudaMemcpy(C, dev_C
63
// Print the result
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
std::cout << C[i*N+j] << " ";
}
std::cout << std::endl;
}
// Free memory
cudaFree(dev_A);
cudaFree(dev_B);
cudaFree(dev_C);
cudaFreeHost(A);
cudaFreeHost(B);
cudaFreeHost(C);
return 0;
64
•
In this program, the `matmul` kernel takes in the two input matrices `A` and `B`, the output matrix `C`, and
the size of the matrices `N`. The kernel uses the `blockIdx.x`, `blockIdx.y`, `threadIdx.x`, and `threadIdx.y`
variables to calculate the indices of the current thread. If the indices are less than `N`, the kernel performs
the matrix multiplication operation `Pvalue += A[Row*N+k] * B[k*N+Col]` and store the Pvalue in
`C[Row*N+Col]`.
•
In the `main` function, the program first allocates memory for the input and output matrices on the host and
initializes them. Then it allocates memory for the matrices on the device and copies the data from the host to
the device using `cudaMemcpy`.
•
Next, the program launches the kernel with the appropriate grid and block dimensions. The kernel uses a 2D
grid of thread blocks to perform the matrix multiplication in parallel.
•
Finally, it copies the data from device to host using cudaMemcpy and prints the result using nested for loop.
And it also frees the memory used.
65
Group 2
66
Measuring the performance enhancement of a
parallel Quicksort algorithm using MPI
67
1. Compare the execution time of the parallel algorithm with that of the sequential algorithm.
1. Divide the input array into equal-sized partitions, and have each process sort its partition using the standard
Quicksort algorithm.
1. The processes then exchange their partitions with each other, and use a parallel merge operation to merge the
sorted partitions into a single, sorted array.
2. To evaluate the performance enhancement, run the parallel Quicksort algorithm with different number of
processes and compare the execution time with the execution time of the sequential Quicksort algorithm.
1. You can also vary the size of the input array and observe how the execution time changes.
1. Additionally, you can measure the speedup and efficiency of the parallel algorithm by using the following
formulas:
Speedup = T_seq / T_par
Efficiency = Speedup / P
Where: T_seq = execution time of the sequential algorithm
T_par = execution time of the parallel algorithm
P = number of processes used
68
Parallel Quicksort
1. Lets start with recursive decomposition - the list is partitioned serially and each of the
subproblems is handled by a different processor.
2. The time for this algorithm is lower-bounded by Ω(n)!
3. Can we parallelize the partitioning step - in particular, if we can use n processors to
partition a list of length n around a pivot in O(1) time, we have a winner.
4. This is difficult to do on real machines, though.
PRAM Formulation
1.
We assume a CRCW (concurrent read, concurrent write) PRAM with concurrent writes resulting in
an arbitrary write succeeding.
2.
The formulation works by creating pools of processors. Every processor is assigned to the same pool
initially and has one element.
3.
Each processor attempts to write its element to a common location (for the pool).
4.
Each processor tries to read back the location. If the value read back is greater than the processor's
value, it assigns itself to the `left' pool, else, it assigns itself to the `right' pool.
5.
Each pool performs this operation recursively.
6.
Note that the algorithm generates a tree of pivots. The depth of the tree is the expected parallel
runtime. The average value is O(log n).
PRAM Formulation
Parallelizing Quicksort: Shared Address Space Formulation
1.
The only thing we have not described is the global reorganization (merging) of local lists to form L
and U.
2.
The problem is one of determining the right location for each element in the merged list.
3.
Each processor computes the number of elements locally less than and greater than pivot.
4.
It computes two sum-scans to determine the starting location for its elements in the merged L and U
lists.
5.
Once it knows the starting locations, it can write its elements safely.
Parallelizing Quicksort: Shared Address Space Formulation
Parallelizing Quicksort: Shared Address Space Formulation
1. The parallel time depends on the split and merge time, and the quality of the pivot.
2. The latter is an issue independent of parallelism, so we focus on the first aspect,
assuming ideal pivot selection.
3. The algorithm executes in four steps: (i) determine and broadcast the pivot; (ii) locally
rearrange the array assigned to each process; (iii) determine the locations in the globally
rearranged array that the local elements will go to; and (iv) perform the global
rearrangement.
4. The first step takes time Θ(log p), the second, Θ(n/p) , the third, Θ(log p) , and the fourth,
Θ(n/p).
5. The overall complexity of splitting an n-element array is Θ(n/p) + Θ(log p).
#include <iostream>
#include <algorithm>
#include <mpi.h>
using namespace std;
// Function to partition the array
int partition(int arr[], int low, int high) {
int pivot = arr[high];
int i = (low - 1);
for (int j = low; j <= high- 1; j++) {
if (arr[j] <= pivot) {
i++;
swap(arr[i], arr[j]);
}
}
swap(arr[i + 1], arr[high]);
return (i + 1);
}
// Function to perform quicksort on the partition
void quicksort(int arr[], int low, int high) {
if (low < high) {
int pivot = partition(arr, low, high);
quicksort(arr, low, pivot - 1);
quicksort(arr, pivot + 1, high);
}
}
75
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int n = 100;
int* arr = new int[n];
int* recvbuf = new int[n];
int* sendbuf = new int[n];
// Fill the array with random values
if (rank == 0) {
for (int i = 0; i < n; i++) {
arr[i] = rand() % 100;
}
} // Divide the array into equal-sized partitions for each process
int sub_arr_size = n / size;
int* sub_arr = new int[sub_arr_size];
MPI_Scatter(arr, sub_arr_size, MPI_INT, sub_arr, sub_arr_size,
MPI_INT, 0, MPI_COMM_WORLD);
// Sort the partition using quicksort
quicksort(sub_arr, 0, sub_arr_size - 1);
76
// Gather the sorted partitions from each process
MPI_Gather(sub_arr, sub_arr_size, MPI_INT, recvbuf,
sub_arr_size, MPI_INT, 0, MPI_COMM_WORLD);
// Print the sorted array
for (int i = 0; i < n; i++) {
cout << recvbuf[i] << " ";
}
cout << endl;
// Measure the execution time of the parallel quicksort
algorithm
double start_time = MPI_Wtime();
// Perform the above steps again
double end_time = MPI_Wtime();
double parallel_execution_time = end_time - start_time;
// Measure the execution time of the sequential quicksort
algorithm
start_time = MPI_Wtime();
quicksort(arr, 0, n - 1);
end_time = MPI_Wtime();
double sequential_execution_time = end_time - start_time;
77
// Calculate speedup and efficiency
double speedup = sequential_execution_time /
parallel_execution_time;
double efficiency = speedup / size;
cout << "Sequential execution time: " <<
sequential_execution_time << endl;
cout << "Parallel execution time: " <<
parallel_execution_time << endl;
cout << "Speedup: " << speedup << endl;
cout << "Efficiency: " << efficiency << endl;
}
MPI_Finalize();
return 0;
78
Implement Huffman Encoding on GPU
79
#include <iostream>
#include <cuda_runtime.h>
__global__ void buildHuffmanTree(int* frequencies, int*
tree, int n) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < n) {
// Find the two lowest frequency nodes
int min1 = INT_MAX, min2 = INT_MAX;
int minIndex1, minIndex2;
for (int j = 0; j < n; j++) {
if (frequencies[j] != 0 && frequencies[j] < min1) {
min2 = min1;
minIndex2 = minIndex1;
min1 = frequencies[j];
minIndex1 = j;
} else if (frequencies[j] != 0 && frequencies[j] <
min2) {
min2 = frequencies[j];
minIndex2 = j;
}
}
80
// Combine the two lowest frequency nodes into a new node
int newNodeIndex = n + i;
frequencies[newNodeIndex] = min1 + min2;
tree[newNodeIndex] = minIndex1;
tree[newNodeIndex + n] = minIndex2;
}
}
int main() {
int n = 256;
int* frequencies;
int* tree;
cudaMalloc(&frequencies, n * sizeof(int));
cudaMalloc(&tree, 2 * n * sizeof(int));
// Initialize frequencies
// ...
int numBlocks = (n + 256 - 1) / 256;
buildHuffmanTree<<<numBlocks, 256>>>(frequencies,
tree, n);
// Encode the data using the Huffman tree
// ...
cudaFree(frequencies);
cudaFree(tree);
81
1. The program defines a kernel function buildHuffmanTree that takes in three parameters: an
array of frequencies, an array of the Huffman tree, and the number of symbols. The kernel
function is executed on the GPU, with each thread working on building a part of the tree.
2. The kernel function first finds the two lowest frequency nodes in the array of frequencies,
and combines them into a new node. The new node is added to the array of the Huffman
tree, and its frequency is the sum of the frequencies of the two lowest frequency nodes.
3. In the main function, the program allocates memory on the GPU for the arrays of
frequencies and the Huffman tree. Then it initializes the array of frequencies (this part of
the code is not shown in the example). Finally, it launches the buildHuffmanTree kernel
function on the GPU by specifying the number of blocks and threads per block, and
passing in the necessary parameters. After the kernel function finishes executing, the
program frees the memory allocated on the GPU.
SINHGAD TECHNICAL EDUCATION SOCIETY’S
RMD SINHGAD SCHOOL OF ENGINEERING
WARJE, PUNE - 411058
Department of Computer Engineering
Laboratory Manual of
Laboratory Practice VI
(BE 2019 Pattern)
SINHGAD TECHNICAL EDUCATION SOCIETY’S
RMD SINHGAD SCHOOL OF ENGINEERING
Warje, Pune 411058
Department of Computer Engineering
LABORATORY MANUAL
AY: 2022-23
IMAGE PROCESSING LABORATORY
BE Computer Engineering
Semester –II
Subject Code – 410252 (B)
TEACHING SCHEME
Practical: 2 Hrs / Week
CREDIT
01
EXAMINATION
TW: 50 Marks
Prepared By:
Mr. Saurabh Vijay Parhad
(Assistant Professor, Department of Computer Engineering)
Vision
“We are committed to produce not only goodengineers but
good human beings, also.”
Mission
“Holistic development of students and teachers is what we
believe in and work for. We strive to achieve this by
imbibing a unique value system, transparent work culture,
excellent academic and physical environment conducive to
learning, creativity and technology transfer. Our mandate is
to generate, preserve and share knowledge for developing a
vibrant society.”
Programme Outcomes (POs)
PO1. Engineering knowledge: Apply the knowledge of mathematics,
science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering problems.
PO2. Problem analysis: Identify, formulate, research literature, and analyze complex engineering
problemsreaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
PO3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.
PO4. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of the
information to provide valid conclusions.
PO5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modelling to complex engineering activities with an
understanding of the limitations.
PO6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the
professionalengineering practice.
PO7. Environment and sustainability: Understand the impact of the professional engineering solutions
in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
PO8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms
of the engineering practice.
PO9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
PO10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.
PO11. Project management and finance: Demonstrate knowledge and understanding of the
engineeringand management principles and apply these to one’s own work, as a member and leader in a
team, to manage projects and in multidisciplinary environments.
PO12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
Programme Specific Outcomes (PSOs)
Computer Engineering graduate will be able to,
PSO1: Project Development: Successfully complete hardware and/or software related system or
applicationprojects, using the phases of project development life cycle to meet the requirements of service
and product industries; government projects; and automate other engineering stream projects.
PSO2: Domain Expertise: Demonstrate effective application of knowledge gained from different
computer domains like, data structures, data bases, operating systems, computer networks, security,
parallel programming, in project development, research and higher education.
PSO3: Career Development: Achieve successful Career and Entrepreneurship- The ability to employ
modern computer languages, environments, and platforms in creating innovative career paths to be an
entrepreneur, and a zest for higher studies.
INDEX
Sr.
No.
Title of Assignment
Page No.
1
Read any image. Display the histogram, Equalized histogram, and
image with equalized histogram
2
2
Read any image. Display the outputs of contrast stretching, intensity
level slicing
8
3
Compare the results of any three edge detection algorithms on the
same image dataset and do the analysis of the result.
14
5
Compare the result of any two image segmentation algorithm on the
same image data set
Image compression using any three compression techniques and
compare the results.
6
Implement image segmentation to detect object in the background
of image.
4
20
25
32
B.E. Computer
Image Processing
Experiment: 1
Assignment No:
1
Title
Read any image Display the histogram, Equalized histogram, and
image withequalized histogram
Roll_no
Class
B.E.
Subject
Image Processing
Sign
Department of Computer Engineering, RMDSSOE, Warje
Page 2
B.E. Computer
Image Processing
ASSIGNMENT:1
TITLE: Display the histogram, Equalized histogram, and image with
equalizedhistogram
OBJECTIVES:
To understand the concepts of Histogram and Histogram Equalization
THEORY-CONCEPT:
Histogram
Histogram is a graphical representation of the intensity distribution of an image. In
simple terms, it represents the number of pixels for each intensity value considered.
In the above figure, X-axis represents the tonal scale (black at the left and white
at the right), and Y-axis represents the number of pixels in an image. Here, the
histogram shows the number of pixels for each brightness level (from black to
white), and when there are more pixels, the peak at the certain brightness level is
higher.
Histogram Equalization
Histogram equalization is used to enhance contrast. It is not necessary that
contrast will always be increase in this.
Department of Computer Engineering, RMDSSOE, Warje
Page 3
B.E. Computer
Image Processing
There may be some cases were histogram equalization can be worse. In those
cases the contrast is decreased. Histogram Equalization is a computer image
processing technique used to improve contrast in images. It accomplishes this by
effectively spreading out the most frequent intensityvalues, i.e. stretching out the
intensity range of the image. This method usually increases the global contrast
of images when its usable data is represented by close contrast values. This
allows for areas of lower local contrast to gain a higher contrast.
A color histogram of an image represents the number of pixels in each type of
color component. Histogram equalization cannot be applied separately to the
Red, Green and Blue components of the image as it leads to dramatic changes in
the image’s color balance. However, if the image is first converted to another
color space, like HSL/HSV color space, then the algorithm can be applied to the
luminance or valuechannel without resulting in changes to the hue and saturation
of the image.
Adaptive Histogram Equalization
Adaptive Histogram Equalization differs from ordinary histogram equalization
in the respect that the adaptive method computes several histograms, each
corresponding to a distinct section of the image, and uses them to redistribute
the lightness values of the image. It is therefore suitable for improving the local
Department of Computer Engineering, RMDSSOE, Warje
Page 4
B.E. Computer
Image Processing
contrast and enhancing the definitions of edges in each region of an image.
Contrastive Limited Adaptive Equalization
Contrast Limited AHE (CLAHE) differs from adaptive histogram equalization
in its contrast limiting. In the case of CLAHE, the contrast limiting procedure is
applied to each neighborhood from which a transformation function is derived.
CLAHE was developed to prevent the over amplification of noise that adaptive
histogram equalization can give rise to.
Department of Computer Engineering, RMDSSOE, Warje
Page 5
B.E. Computer
Image Processing
Steps Involved
• Get the input image
• Generate the histogram for the image
• Find the local minima of the image
• Divide the histogram based on the local minima
• Have the specific gray levels for each partition of the histogram
• Apply the histogram equalization on each partition
Algorithm
Compute the histogram of pixel values of the input image. The histogram
placesthe value of each pixel 𝑓[𝑓,𝑓] into one of L uniformly-spaced buckets
ℎ[𝑓]
Where 𝑓=2^8 and the image dimension is 𝑓×𝑓
Department of Computer Engineering, RMDSSOE, Warje
Page 6
B.E. Computer
Image Processing
Calculate the cumulative distribution function
Scale the input image using the cumulative distribution function to produce the
output image.
Where CDFmin is the smallest non-zero value of the cumulative distribution
function
CLAHE (Contrast Limited Adaptive Histogram Equalization)
The above histogram equalization considers the global contrast of the image,
and in many cases, it is not a good idea. Even though the contrast of the image
had been improved, we lost some of the information due to over brightness. The
reasonis that the histogram is not confined to the local region.
Adaptive Histogram Equalization helps to solve this issue. In this method, the
image is divided into small blocks, and each of these blocks is histogram
equalized.
The same image has been converted, and below is the output of Adaptive
Histogram Equalization.
CONCLUSION/ANALYSIS:
We have studied the concepts of the histogram, Equalized histogram, and
imagewith equalized histogram
Department of Computer Engineering, RMDSSOE, Warje
Page 7
B.E. Computer
Image Processing
Experiment: 2
Assignment No:
Title
2
Display the outputs of contrast stretching, intensity level slicing
Roll_no
Class
B.E.
Subject
Image Processing
Sign
Department of Computer Engineering, RMDSSOE, Warje
Page 8
B.E. Computer
Image Processing
ASSIGNMENT :2
TITLE:
Display the outputs of contrast stretching, intensity level slicing.
OBJECTIVES:
1. Understand the concept of Contrast Stretching of an image.
2. Understand the concept of Intensity level Slicing of an image
THEORY-CONCEPT
Contrast Stretching
Contrast stretching is an Image Enhancement method which attempts to improve an image
by stretching the range of intensity values.
Here, we stretch the minimum and maximum intensity values present to the possible
minimum and maximum intensity values.
Example: If the minimum intensity value(r min ) present in the image is 100 then it is
stretched to the possible minimum intensity value 0. Likewise, if the maximum intensity
value(r max) is less than the possible maximum intensity value 255 then it is stretched out
to 255.(0–255 is taken as standard minimum and maximum intensity values for 8-bit
images)
Note: Contrast stretching is only possible if minimum intensity value and maximum
intensity value are not equal to the possible minimum and maximum intensity values.
Otherwise, the image generated after contrast stretching will be the same as input image.
General Formula for Contrast Stretching:
Department of Computer Engineering, RMDSSOE, Warje
Page 9
B.E. Computer
Image Processing
For I min = 0 and I max = 255 (for standard 8-bit grayscale image)
where,
r = current pixel intensity value
r min = minimum intensity value present in the whole image
r max = maximum intensity value present in the whole image
Note: Output intensity value s should be rounded up to nearest integer value.
Output Example:
Gray level slicing:
Gray level slicing is a technique used in image processing to extract a set of pixels from
an image based on their gray level values. This technique can be used to select a specific
range of gray levels from an image, or to select all pixels with a gray level value above or
below a certain threshold.
Gray level slicing is commonly used in image processing tasks such as image
thresholding, where a specific range of gray levels is selected from an image and all other
Department of Computer Engineering, RMDSSOE, Warje
Page 10
B.E. Computer
Image Processing
pixels are set to black or white. This technique can also be used to extract features from
an image, or to remove background noise.
In a photograph, a gray level ranges from gray to black to red. Slicing is a type of
filtering that takes band passes to a different level. Rest or maintaining the intensity level
at the same level is used for controlling the effect. Bit plane slicing is an approach that
generates binary images by converting gray-level images into binary. Every grey level’s
map is made up of high-quality maps calculated from a variety of factors. A bit plane is a
position in digital discrete signals such as images or sounds in binary numbers. Photos
with log transformations show darker patches of the image. Identity transformation is the
process of changing one’s identity.
Department of Computer Engineering, RMDSSOE, Warje
Page 11
B.E. Computer
Image Processing
Gray level slicing is a technique used in image processing to create a new image from an
existing image. The new image is created by thresholding the original image, which
means that only pixels that meet a certain criteria is included in the new image. Gray
level slicing is often used to highlight certain features in an image, or to remove
background noise.
What Is Gray Level In Image Processing
The image processing techniques were designed to transform gray matter at the pixels
level, regardless of whether the image is black or white. In a histogram, the horizontal
axis is from 0 to 255, and the vertical axis is determined by the number of pixels in the
image. The gray level image is made up of 256 levels of gray and is made up of a
histogram.
Department of Computer Engineering, RMDSSOE, Warje
Page 12
B.E. Computer
Image Processing
Gray depth varies by less than a tenth of a degree in a grey level resolution. Layers that
go gray in the grid, also known as band pass filtering, remove layers that go gray in the
grid. By decreasing the intensity of multiple images or leaving them alone, it can
manipulate the intensity of all images. Because the k bit is measured by the number of
pixels per pixel, the grey scale for an image is the same as the number of pixels per bit. A
grey sphere is made up of 256 squares that are all in the shape of a k bit. For images with
8 bits of depth, compression is set to 255 bits per second.
Intensity Level Slicing In Image Processing
A set of specific intensities is highlighted by intensity level slicing in an image. As a
result, each gray level region is distinguished from the rest of the image. When it comes
to an image, the value of your region of interest should always be between 80 and 150.
Intensity Transformations In Images
The use of intensity transforms is possible in many ways. There are numerous types of
notes. When compared to an original image, contrast stretching increases the intensity of
pixels that are brighter than the original. The intensity of the pixels in an image with
contrast reduction is reduced by 50%. A brightening improves the brightness of the
original image. The inverse of the color is that the pixels in the original image darker.
CONCLUSION/ANALYSIS:
We have studied the concepts of contrast stretching, intensity level slicing.
Department of Computer Engineering, RMDSSOE, Warje
Page 13
B.E. Computer
Image Processing
Experiment: 3
Assignment No:
Title
3
Study any three edge detection algorithms
Roll_no
Class
B.E.
Subject
Image Processing
Sign
Department of Computer Engineering, RMDSSOE, Warje
Page 14
B.E. Computer
Image Processing
ASSIGNMENT:3
TITLE:
Compare any three Edge Detection Algorithms
OBJECTIVES:
To Study and understand different Edge detection algorithms.
THEORY:
Edges are significant local changes of intensity in a digital image. An edge can be
defined as a set of connected pixels that forms a boundary between two disjoint regions.
There are three types of edges:
•
Horizontal edges
•
Vertical edges
•
Diagonal edges
Edge Detection is a method of segmenting an image into regions of discontinuity. It is
a widely used technique in digital image processing like
•
pattern recognition
•
image morphology
•
feature extraction
Edge detection allows users to observe the features of an image for a significant change
in the gray level. This texture indicating the end of one region in the image and the
beginning of another. It reduces the amount of data in an image and preserves the
structural properties of an image.
Edge Detection Operators are of two types:
Department of Computer Engineering, RMDSSOE, Warje
Page 15
B.E. Computer
•
Image Processing
Gradient – based operator which computes first-order derivations in a digital image
like, Sobel operator, Prewitt operator, Robert operator
•
Gaussian – based operator which computes second-order derivations in a digital
image like, Canny edge detector, Laplacian of Gaussian
Sobel Operator: It is a discrete differentiation operator. It computes the gradient
approximation of image intensity function for image edge detection. At the pixels of an
image, the Sobel operator produces either the normal to a vector or the corresponding
gradient vector. It uses two 3 x 3 kernels or masks which are convolved with the input
image to calculate the vertical and horizontal derivative approximations respectively –
Advantages:
1. Simple and time efficient computation
2. Very easy at searching for smooth edges
Department of Computer Engineering, RMDSSOE, Warje
Page 16
B.E. Computer
Image Processing
Limitations:
1. Diagonal direction points are not preserved always
2. Highly sensitive to noise
3. Not very accurate in edge detection
4. Detect with thick and rough edges does not give appropriate results
Prewitt Operator: This operator is almost similar to the sobel operator. It also detects
vertical and horizontal edges of an image. It is one of the best ways to detect the
orientation and magnitude of an image. It uses the kernels or masks
Advantages:
1. Good performance on detecting vertical and horizontal edges
2. Best operator to detect the orientation of an image
Limitations:
1. The magnitude of coefficient is fixed and cannot be changed
2. Diagonal direction points are not preserved always
Robert Operator: This gradient-based operator computes the sum of squares of the
differences between diagonally adjacent pixels in an image through discrete
differentiation. Then the gradient approximation is made. It uses the following 2 x 2
kernels or masks –
Advantages:
1. Detection of edges and orientation are very easy
Department of Computer Engineering, RMDSSOE, Warje
Page 17
B.E. Computer
Image Processing
2. Diagonal direction points are preserved
Limitations:
1. Very sensitive to noise
2. Not very accurate in edge detection
Canny Operator: It is a gaussian-based operator in detecting edges. This operator is
not susceptible to noise. It extracts image features without affecting or altering the
feature. Canny edge detector have advanced algorithm derived from the previous work
of Laplacian of Gaussian operator. It is widely used an optimal edge detection
technique. It detects edges based on three criteria:
1. Low error rate
2. Edge points must be accurately localized
3. There should be just one single edge response
Advantages:
1. It has good localization
2. It extract image features without altering the features
3. Less Sensitive to noise
Limitations:
1. There is false zero crossing
2. Complex computation and time consuming
Some Real-world Applications of Image Edge Detection:
•
medical imaging, study of anatomical structure
•
locate an object in satellite images
•
automatic traffic controlling systems
•
face recognition, and fingerprint recognition
Department of Computer Engineering, RMDSSOE, Warje
Page 18
B.E. Computer
Image Processing
CONCLUSION/ANALYSIS:
We have studied different types of edge detection algorithm comparison in image
processing and their real world applications.
Department of Computer Engineering, RMDSSOE, Warje
Page 19
B.E. Computer
Image Processing
Experiment: 4
Assignment No:
Title
4
Study and Understand image segmentation algorithm
Roll_no
Class
B.E.
Subject
Image Processing
Sign
Department of Computer Engineering, RMDSSOE, Warje
Page 20
B.E. Computer
Image Processing
ASSIGNMENT:4
TITLE:
Image Segmentation Techniques.
OBJECTIVES:
To study and understand Image Segmentation Techniques.
Theory:
What Is Image Segmentation?
Image segmentation is a method of dividing a digital image into subgroups called
image segments, reducing the complexity of the image and enabling further processing
or analysis of each image segment. Technically, segmentation is the assignment of
labels to pixels to identify objects, people, or other important elements in the image.
A common use of image segmentation is in object detection. Instead of processing the
entire image, a common practice is to first use an image segmentation algorithm to
find objects of interest in the image. Then, the object detector can operate on a
bounding box already defined by the segmentation algorithm. This prevents the
detector from processing the entire image, improving accuracy and reducing inference
time.
Image segmentation is a key building block of computer vision technologies and
algorithms. It is used for many practical applications including medical image
analysis, computer vision for autonomous vehicles, face recognition and detection,
video surveillance, and satellite image analysis.
How Does Image Segmentation Work?
Image segmentation is a function that takes image inputs and produces an output. The
output is a mask or a matrix with various elements specifying the object class or
instance to which each pixel belongs.
Several relevant heuristics, or high-level image features, can be useful for image
Department of Computer Engineering, RMDSSOE, Warje
Page 21
B.E. Computer
Image Processing
segmentation. These features are the basis for standard image segmentation algorithms
that use clustering techniques like edges and histograms.
There are various neural network designs and implementations suitable for image
segmentation. They usually contain the same basic components:
•
An encoder—a series of layers that extract image features using progressively
deeper, narrower filters. The encoder might be pre-trained on a similar task (e.g.,
image recognition), allowing it to leverage its existing knowledge to perform
segmentation tasks.
•
A decoder—a series of layers that gradually convert the encoder’s output into a
segmentation mask corresponding with the input image’s pixel resolution.
•
Skip connections—multiple long-range neural network connections allowing the
model to identify features at different scales to enhance model accuracy.
Image Segmentation Techniques:
1. Edge-Based Segmentation
Edge-based segmentation is a popular image processing technique that identifies the
edges of various objects in a given image. It helps locate features of associated objects
in the image using the information from the edges. Edge detection helps strip images
of redundant information, reducing their size and facilitating analysis.
Department of Computer Engineering, RMDSSOE, Warje
Page 22
B.E. Computer
Image Processing
Edge-based segmentation algorithms identify edges based on contrast, texture, color,
and saturation variations. They can accurately represent the borders of objects in an
image using edge chains comprising the individual edges.
2. Threshold-Based Segmentation
Thresholding is the simplest image segmentation method, dividing pixels based on
their intensity relative to a given value or threshold. It is suitable for segmenting
objects with higher intensity than other objects or backgrounds.
The threshold value T can work as a constant in low-noise images. In some cases, it is
possible to use dynamic thresholds. Thresholding divides a grayscale image into two
segments based on their relationship to T, producing a binary image.
3. Region-Based Segmentation
Region-based segmentation involves dividing an image into regions with similar
characteristics. Each region is a group of pixels, which the algorithm locates via a seed
point. Once the algorithm finds the seed points, it can grow regions by adding more
pixels or shrinking and merging them with other points.
4. Cluster-Based Segmentation
Clustering algorithms are unsupervised classification algorithms that help identify
hidden information in images. They augment human vision by isolating clusters,
shadings, and structures. The algorithm divides images into clusters of pixels with
Department of Computer Engineering, RMDSSOE, Warje
Page 23
B.E. Computer
Image Processing
similar characteristics, separating data elements and grouping similar elements into
clusters.
5. Watershed Segmentation
Watersheds are transformations in a grayscale image. Watershed segmentation
algorithms treat images like topographic maps, with pixel brightness determining
elevation (height). This technique detects lines forming ridges and basins, marking the
areas between the watershed lines. It divides images into multiple regions based on
pixel height, grouping pixels with the same gray value.
The watershed technique has several important use cases, including medical image
processing. For example, it can help identify differences between lighter and darker
regions in an MRI scan, potentially assisting with diagnosis.
Conclusion:
Successfully detected the object in the background of image using Image
Segmentation techniques.
Department of Computer Engineering, RMDSSOE, Warje
Page 24
B.E. Computer
Image Processing
Experiment: 5
Assignment No:
Title
5
Image compression Techniques.
Roll_no
Class
B.E.
Subject
Image Processing
Sign
Department of Computer Engineering, RMDSSOE, Warje
Page 25
B.E. Computer
Image Processing
ASSIGNMENT:5
TITLE:
Image Compression Techniques.
OBJECTIVES:
To study and understand Image Compression Techniques.
THEORY:
IMAGE COMPRESSION:
Image compression addresses the problem of reducing the amount of data required to
represent a digital image. It is a process intended to yield a compact representation of an
image, thereby reducing the image storage/transmission requirements. Compression is
achieved by the removal of one or more of the three basic data redundancies:
1. Coding Redundancy
2. Interpixel Redundancy
3. Psychovisual Redundancy
Coding redundancy is present when less than optimal code words are used. Interpixel
redundancy results from correlations between the pixels of an image. Psychovisual
redundancy is due to data that is ignored by the human visual system (i.e. visually non
essential information).
Image compression techniques reduce the number of bits required to represent an image by
taking advantage of these redundancies. An inverse process called decompression
(decoding) is applied to the compressed data to get the reconstructed image. The objective
of compression is to reduce the number of bits as much as possible, while keeping the
resolution and the visual quality of the reconstructed image as close to the original image
as possible. Image compression systems are composed of two distinct structural blocks: an
encoder and a decoder.
Department of Computer Engineering, RMDSSOE, Warje
Page 26
B.E. Computer
Image Processing
STEPS INVOLVED IN COMPRESSING IMAGE:
Input image Output image Encoder Source encoder Channel encoder Channel decoder
Source decoder Channel Decoder Image f(x,y) is fed into the encoder, which creates a set
of symbols form the input data and uses them to represent the image.
If we let n1 and n2 denote the number of information carrying units (usually bits) in the
original and encoded images respectively, the relative data redundancy RD of the first
data set can be defined as
RD = 1-1/ CR
Where CR. commonly called the compression Ratio
CR = n1 /n2
As shown in the figure, the encoder is responsible for reducing the coding, interpixel and
psycho visual redundancies of input image.
2. IMAGE COMPRESSION TECHNIQUES
The image compression techniques are broadly classified into two categories depending
whether or not an exact replica of the original image could be reconstructed using the
compressed image. These are:
1. Lossless technique
2. Lossy technique
Department of Computer Engineering, RMDSSOE, Warje
Page 27
B.E. Computer
Image Processing
Lossless compression technique
In lossless compression techniques, the original image can be perfectly recovered from
the compressed (encoded) image. These are also called noiseless since they do not add
noise to the signal (image).It is also known as entropy coding since it use
statistics/decomposition techniques to eliminate/minimize redundancy. Lossless
compression is used only for a few applications with stringent requirements such as
medical imaging. Following techniques are included in lossless compression:
1. Run length encoding
2. Huffman encoding
3. LZW coding
4. Area coding
Lossy compression technique:
Lossy schemes provide much higher compression ratios than lossless schemes. Lossy
schemes are widely used since the quality of the reconstructed images is adequate for
most applications .By this scheme, the decompressed image is not identical to the original
image, but reasonably close to it.
Major performance considerations of a lossy compression scheme include:
➢ Compression ratio
➢ Signal - to – noise ratio
➢ Speed of encoding & decoding.
Lossy compression techniques includes following schemes:
➢ Lossy predictive coding
➢ Transform coding
➢ Wavelet coding
LOSSLESS COMPRESSION TECHNIQUE
1. Run Length Encoding :
Run-length encoding (RLE) is a very simple form of data compression in which runs of
Department of Computer Engineering, RMDSSOE, Warje
Page 28
B.E. Computer
Image Processing
data (that is, sequences in which the same data value occurs in many consecutive data
elements) are stored as a single data value and count, rather than as the original run. This
is most useful on data that contains many such runs: for example, relatively simple
graphic images such as icons, line drawings, and animations.
Let us take a hypothetical single scan line, with B representing a black pixel and W
representing white:
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWW
WWWWWWWWWW WWWBWWWWWWWWWWWWWW
If we apply the run-length encoding (RLE) data compression algorithm to the above
hypothetical scan line, we get the following: 12W1B12W3B24W1B14W.This is a
general technique for coding symbols based on their statistical occurrence frequencies
(probabilities). The pixels in the image are treated as symbols. The symbols that occur
more frequently are assigned a smaller number of bits, while the symbols that occur less
frequently are assigned a relatively larger number of bits.
1 Huffman Coding :
Dr. David A. Huffman proposed Huffman Coding in 1952. This method used for the
construction of minimum redundancy code. Huffman code is a technique for compressing
data. Huffman's greedy algorithm looks at the occurrence of each character and it as a
binary string in an optimal way. Huffman coding is a form of statistical coding which
attempts to reduce the amount of bits required to represent a string of symbols. The
algorithm accomplishes its goals by allowing symbols to vary in length. Shorter codes are
assigned to the most frequently used symbols, and longer codes to the symbols which
appear less frequently in the string (that's where the statistical part comes in). Code word
lengths are no longer fixed like ASCII .Code word lengths vary and will be shorter for the
more frequently used characters.
Huffman code is a prefix code. This means that the (binary) code of any symbol is not the
Department of Computer Engineering, RMDSSOE, Warje
Page 29
B.E. Computer
Image Processing
prefix of the code of any other symbol. Most image coding standards use lossy techniques
in the earlier stages of compression and use Huffman coding as the final step.
2 LZW Coding:
LZW (Lempel- Ziv – Welch) is a dictionary based coding. Dictionary based coding can
be static or dynamic. In static dictionary coding, dictionary is fixed during the encoding
and decoding processes. In dynamic dictionary coding, the dictionary is updated on fly.
LZW is widely used in computer industry and is implemented as compress command on
UNIX. LZW compression has been integrated into a variety of mainstream imaging rule
formats, including the graphic interchange format(GIF),tagged image file format(TIFF),
and the portable document format(PDF).
LZW coding process, a “codebook or dictionary” containing the source symbols to be
coded is constructed. For 8-bit monochrome images, the first 256 words of the dictionary
are assigned to the gray values 0, 1, 2…255.
3 Area Coding:
Area coding is an enhanced form of run length coding, reflecting the two dimensional
character of images. This is a significant advance over the other lossless methods. For
Department of Computer Engineering, RMDSSOE, Warje
Page 30
B.E. Computer
Image Processing
coding an image it does not make too much sense to interpret it as a sequential stream, as
it is in fact an array of sequences, building up a two dimensional object. The algorithms
for area coding try to find rectangular regions with the same characteristics. Theseregions
are coded in a descriptive form as an element with two points and a certainstructure. This
type of coding can be highly effective but it bears the problem of a nonlinear method,
which cannot be implemented in hardware.
Conclusion:
We have studied different image compression Techniques.
Department of Computer Engineering, RMDSSOE, Warje
Page 31
B.E. Computer
Image Processing
Experiment: 6
Assignment No:
Title
6
Mini Project - Implement image segmentation to detect object in the
background of image.
Roll_no
Class
B.E.
Subject
Image Processing
Sign
Department of Computer Engineering, RMDSSOE, Warje
Page 32
B.E. Computer
Image Processing
ASSIGNMENT:6
TITLE:
Implement image segmentation to detect object in the background of image.
OBJECTIVES:
To study and understand Image segmentation to detect object in the background of
image.
THEORY:
Image Segmentation:
Let’s understand image segmentation using a simple example. Consider the below image:
There’s only one object here – a dog. We can build a straightforward cat-dog classifier
model and predict that there’s a dog in the given image. But what if we have both a cat
and a dog in a single image?
Department of Computer Engineering, RMDSSOE, Warje
Page 33
B.E. Computer
Image Processing
We can train a multi-label classifier, in that instance. Now, there’s another caveat – we
won’t know the location of either animal/object in the image.
That’s where image localization comes into the picture (no pun intended!). It helps us to
identify the location of a single object in the given image. In case we have multiple
objects present, we then rely on the concept of object detection (OD). We can predict the
location along with the class for each object using OD.
Before detecting the objects and even before classifying the image, we need to understand
what the image consists of. Enter – Image Segmentation.
So how does image segmentation work?
We can divide or partition the image into various parts called segments. It’s not a great
idea to process the entire image at the same time as there will be regions in the image
which do not contain any information. By dividing the image into segments, we can make
use of the important segments for processing the image. That, in a nutshell, is how image
segmentation works.
An image is a collection or set of different pixels. We group together the pixels that have
similar attributes using image segmentation. Take a moment to go through the below
visual (it’ll give you a practical idea of image segmentation):
Department of Computer Engineering, RMDSSOE, Warje
Page 34
B.E. Computer
Image Processing
Object detection builds a bounding box corresponding to each class in the image. But it
tells us nothing about the shape of the object. We only get the set of bounding box
coordinates. We want to get more information – this is too vague for our purposes.
Image segmentation creates a pixel-wise mask for each object in the image. This
technique gives us a far more granular understanding of the object(s) in the image.
The Different Types of Image Segmentation:
We can broadly divide image segmentation techniques into two types. Consider the
below images:
Department of Computer Engineering, RMDSSOE, Warje
Page 35
B.E. Computer
Image Processing
Can we identify the difference between these two? Both the images are using image
segmentation to identify and locate the people present.
•
In image 1, every pixel belongs to a particular class (either background or person). Also,
all the pixels belonging to a particular class are represented by the same color
(background as black and person as pink). This is an example of semantic segmentation
•
Image 2 has also assigned a particular class to each pixel of the image. However,
different objects of the same class have different colors (Person 1 as red, Person 2 as
green, background as black, etc.). This is an example of instance segmentation
If there are 5 people in an image, semantic segmentation will focus on classifying all the
people as a single instance. Instance segmentation, on the other hand. Will identify each
of these people individually.
Conclusion:
We have studied image segmentation to detect object in the background of image
Department of Computer Engineering, RMDSSOE, Warje
Page 36
Download