STES’s RMD Sinhgad School of Engineering, Warje-58 410255: Laboratory Practice V High Performance Computing (410250) Department of Computer Engineering BE (2019 Course) SINHGAD TECHNICAL EDUCATION SOCIETY’S RMD SINHGAD SCHOOL OF ENGINEERING Warje, Pune 411058 Department of Computer Engineering LABORATORY MANUAL AY: 2022-23 High Performance Computing BE Computer Engineering Semester –II Subject Code - 410250 TEACHING SCHEME Practical: 2 Hrs / Week CREDIT 01 EXAMINATION TW: 50 Marks PR:50 Marks Prepared By: Mrs. Pradnya Kasture &Mrs. Shafali Gupta (Assistant Professor, Department of Computer Engineering) Course Objectives To understand and implement searching and sorting algorithms To learn the fundamentals of GPU Computing in the CUDA environment To illustrate the concepts of Artificial Intelligence/Machine Learning (AI/ML) To understand Hardware acceleration To implement different deep learning models 2 Course Outcomes CO1: Analyze and measure performance of sequential and parallel algorithms. CO2: Design and Implement solutions for multicore/Distributed/parallel environment. CO3: Identify and apply the suitable algorithms to solve AI/ML problems. CO4: Apply the technique of Deep Neural network for implementing Linear regression and classification. CO5: Apply the technique of Convolution (CNN) for implementing Deep Learning models. CO6: Design and develop Recurrent Neural Network (RNN) for prediction. 3 Guidelines for Laboratory Conduction ● List of recommended programming assignments and sample mini-projects is provided for reference. ● Referring these, Course Teacher or Lab Instructor may frame the assignments/mini-project by understanding the prerequisites, technological aspects, utility and recent trends related to the respective courses. ● Preferably there should be multiple sets of assignments/mini-project and distribute among batches of students. ● Real world problems/application based assignments/mini-projects create interest among learners serving as foundation for future research or startup of business projects. ● Mini-project can be completed in group of 2 to 3 students. ● Software Engineering approach with proper documentation is to be strictly followed. ● Use of open source software is to be encouraged. ● Instructor may also set one assignment or mini-project that is suitable to respective course beyond the scope of syllabus. • Operating System recommended :- 64-bit Open source Linux or its derivative • Programming Languages: Object Oriented Languages C++/JAVA/PYTHON/R • Programming tools recommended: Front End: Java/Perl/PHP/Python/Ruby/.net, • Backend : MongoDB/MYSQL/Oracle • Database Connectivity : ODBC/JDBC 4 5 Group 1 Assignment 1 : Design and implement Parallel Breadth First Search and Depth First Search based on existing algorithms using OpenMP. Use a Tree or an undirected graph for BFS and DFS . 6 Parallel Breadth First Search 1. To design and implement parallel breadth first search, you will need to divide the graph into smaller sub-graphs and assign each sub-graph to a different processor or thread. 2. Each processor or thread will then perform a breadth first search on its assigned sub-graph concurrently with the other processors or threads. 3. Two methods : Vertex by Vertex OR Level By Level 7 Breadth First Search To design and implement parallel breadth first search using OpenMP, you can use the existing breadth first search algorithm and parallelize it using OpenMP's palatalization constructs. 9 Program Sample : #include <iostream> #include <vector> #include <queue> #include <omp.h> using namespace std; void bfs(vector<vector<int>>& graph, int start, vector<bool>& visited) { queue<int> q; q.push(start); visited[start] = true; #pragma omp parallel { #pragma omp single { while (!q.empty()) { int vertex = q.front(); q.pop(); 10 #pragma omp task firstprivate(vertex) { for (int neighbor : graph[vertex]) { if (!visited[neighbor]) { q.push(neighbor); visited[neighbor] = true; #pragma omp task bfs(graph, neighbor, visited); } } } } } } } void parallel_bfs(vector<vector<int>>& graph, int start) { vector<bool> visited(graph.size(), false); bfs(graph, start, visited); } 11 1. In this implementation, the parallel_bfs function takes in a graph represented as an adjacency list, where each element in the list is a vector of neighboring vertices, and a starting vertex. 2. The bfs function uses a queue to keep track of the vertices to visit, and a boolean visited array to keep track of which vertices have been visited. The #pragma omp parallel directive creates a parallel region and the #pragma omp single directive creates a single execution context within that region. 3. Inside the while loop, the #pragma omp task directive creates a new task for each unvisited neighbor of the current vertex. 4. This allows each task to be executed in parallel with other tasks. The firstprivate clause is used to ensure that each task has its own copy of the vertex variable. 5. This is just one possible implementation, and there are many ways to improve it depending on the specific requirements of your application. For example, you can use omp atomic or omp critical to protect the shared resource queue. 6. This is just one possible implementation, and there are many ways to improve it depending on the specific requirements of your application. For example, you can use omp atomic or omp critical to protect the shared resource queue. 12 Parallel Depth First Search 13 Parallel Depth-First Search • Different subtrees can be searched concurrently. • Subtrees can be very different in size. • Estimate the size of a subtree rooted at a node. • Dynamic load balancing is required. Parallel Depth-First Search: Dynamic Load Balancing • When a processor runs out of work, it gets more work from another processor. • This is done using work requests and responses in message passing machines and locking and extracting work in shared address space machines. • On reaching final state at a processor, all processors terminate. • Unexplored states can be conveniently stored as local stacks at processors. • The entire space is assigned to one processor to begin with. Parameters in Parallel DFS: Work Splitting • Work is split by splitting the stack into two. • Ideally, we do not want either of the split pieces to be small. • Select nodes near the bottom of the stack (node splitting), or • Select some nodes from each level (stack splitting). • The second strategy generally yields a more even split of the space. Load-Balancing Schemes • Asynchronous round robin: Each processor maintains a counter and makes requests in a roundrobin fashion. • Global round robin: The system maintains a global counter and requests are made in a roundrobin fashion, globally. • Random polling: Request a randomly selected processor for work. Analyzing DFS • We can’t compute, analytically, the serial work W or parallel time. Instead, we quantify total overhead To in terms of W to compute scalability. • For dynamic load balancing, idling time is subsumed by communication. • We must quantify the total number of requests in the system. Load-Balancing Schemes • Asynchronous round robin has poor performance because it makes a large number of work requests. • Global round robin has poor performance because of contention at counter, although it makes the least number of requests. • Random polling strikes a desirable compromise. Termination Detection • Processor P0 has all the work and a weight of one is associated with it. When its work is partitioned and sent to another processor, processor P0 retains half of the weight and gives half of it to the processor receiving the work. • If Pi is the recipient processor and wi is the weight at processor Pi, then after the first work transfer, both w0 and wi are 0.5. • Each time the work at a processor is partitioned, the weight is halved. When a processor completes its computation, it returns its weight to the processor from which it received work. • Termination is signaled when the weight w0 at processor P0 becomes one and processor P0 has finished its work Program #include <iostream> #include <vector> #include <stack> #include <omp.h> using namespace std; void dfs(vector<vector<int>>& graph, int start, vector<bool>& visited) { stack<int> s; s.push(start); visited[start] = true; #pragma omp parallel { #pragma omp single { while (!s.empty()) { int vertex = s.top(); s.pop(); 21 #pragma omp task firstprivate(vertex) { for (int neighbor : graph[vertex]) { if (!visited[neighbor]) { s.push(neighbor); visited[neighbor] = true; #pragma omp task dfs(graph, neighbor, visited); } } } } } } void parallel_dfs(vector<vector<int>>& graph, int start) { vector<bool> visited(graph.size(), false); dfs(graph, start, visited); } 22 1. In this implementation, the parallel_dfs function takes in a graph represented as an adjacency list, where each element in the list is a vector of neighboring vertices, and a starting vertex. 2. The dfs function uses a stack to keep track of the vertices to visit, and a boolean visited array to keep track of which vertices have been visited. 3. The #pragma omp parallel directive creates a parallel region and the #pragma omp single directive creates a single execution context within that region. 4. Inside the while loop, the #pragma omp task directive creates a new task for each unvisited neighbor of the current vertex. 5. This allows each task to be executed in parallel with other tasks. The firstprivate clause is used to ensure that each task has its own copy of the vertex variable. 6. This implementation is suitable for both tree and undirected graph, since both are represented as an adjacency list and the algorithm is using a stack to traverse the graph. 7. This is just one possible implementation, and there are many ways to improve it depending on the specific requirements of your application. For example, you can use omp atomic or omp critical to protect the shared resource stack. 8. The dfs function uses a stack to keep track of the vertices to visit, and a boolean visited array to keep track of which vertices have been visited. The #pragma omp parallel directive creates a parallel region and the #pragma omp single directive creates a single execution context within that region. 23 Assignment 2 : Write a program to implement Parallel Bubble Sort and Merge sort using OpenMP. Use existing algorithms and measure the performance of sequential and parallel algorithms. 24 Bubble sort 1. The complexity of bubble sort is Θ(n2). 2. Bubble sort is difficult to parallelize since the algorithm has no concurrency. 3. A simple variant, though, uncovers the concurrency. 25 Odd-Even Transposition Sorting n = 8 elements, using the odd-even transposition sort algorithm. During each phase, n = 8 elements are compared. Odd-Even Transposition Sequential odd-even transposition sort algorithm. Odd-Even Transposition 1. After n phases of odd-even exchanges, the sequence is sorted. 2. Each phase of the algorithm (either odd or even) requires Θ(n) comparisons. 3. Serial complexity is Θ(n2). Parallel Odd-Even Transposition 1. Consider the one item per processor case. 2. There are n iterations, in each iteration, each processor does one compare-exchange. 3. The parallel run time of this formulation is Θ(n). 4. This is cost optimal with respect to the base serial algorithm but not the optimal one. Parallel Odd-Even Transposition Parallel formulation of odd-even transposition. Parallel Odd-Even Transposition 1. Consider a block of n/p elements per processor. 2. The first step is a local sort. 3. In each subsequent step, the compare exchange operation is replaced by the compare split operation. 4. The parallel run time of the formulation is Bubble Sort Odd Even Transposition 32 #include <iostream> #include <vector> #include <omp.h> using namespace std; void bubble_sort_odd_even(vector<int>& arr) { bool isSorted = false; while (!isSorted) { isSorted = true; #pragma omp parallel for for (int i = 0; i < arr.size() - 1; i += 2) { if (arr[i] > arr[i + 1]) { swap(arr[i], arr[i + 1]); isSorted = false; } } #pragma omp parallel for for (int i = 1; i < arr.size() - 1; i += 2) { if (arr[i] > arr[i + 1]) { swap(arr[i], arr[i + 1]); isSorted = false; } } } } 33 int main() { vector<int> arr = {5, 2, 9, 1, 7, 6, 8, 3, 4}; double start, end; // Measure performance of parallel bubble sort using oddeven transposition start = omp_get_wtime(); bubble_sort_odd_even(arr); end = omp_get_wtime(); cout << "Parallel bubble sort using odd-even transposition time: " << end - start << endl; } 34 1. 2. 3. 4. 5. 6. 7. 8. 9. This program uses OpenMP to parallelize the bubble sort algorithm. The #pragma omp parallel for directive tells the compiler to create a team of threads to execute the for loop within the block in parallel. Each thread will work on a different iteration of the loop, in this case on comparing and swapping the elements of the array. The bubbleSort function takes in an array, and it sorts it using the bubble sort algorithm. The outer loop iterates from 0 to n-2 and the inner loop iterates from 0 to n-i-1, where i is the index of the outer loop. The inner loop compares the current element with the next element, and if the current element is greater than the next element, they are swapped. The main function creates a sample array and calls the bubbleSort function to sort it. The sorted array is then printed. This is a skeleton code and it may not run as is and may need some modification to work with specific inputs and requirements. It is worth noting that bubble sort is not an efficient sorting algorithm, specially for large inputs, and it may not scale well with more number of threads. Also parallelizing bubble sort does not have a significant improvement in performance due to the nature of the algorithm itself. In this implementation, the bubble_sort_odd_even function takes in an array and sorts it using the odd-even transposition algorithm. The outer while loop continues until the array is sorted. Inside the loop, the #pragma omp parallel for directive creates a parallel region and divides the loop iterations among the available threads. Each thread performs the swap operation in parallel, improving the performance of the algorithm. The two #pragma omp parallel for inside while loop, one for even indexes and one for odd indexes, allows each thread to sort the even and odd indexed elements simultaneously and prevent the dependency. 35 Merge Sort 36 Parallel Merge Sort Given a set of elements A = {a .. ~,~, ... ,an}, Aodd and Aeven are defined as the set of elements of A with odd and even indices, respectively. For example, Aodd = {a .. a3,a" ... } and Aeven = { ~,a4,a6" .. } regarding a set of elements A = {a…an}' Similarly, let a set of elements B = {b ….bn}. We can then define the merge operation as: Merge(A,B) = {a.,b.,~,b2,a3,b3, ... ,an,bn} For example, if A = {1,2,3,4} and B = {5,6,7,8} then Merge( (1,2,3,4 },{5,6,7,8}) = {1,5,2,6,3,7,4,8} Join(A,B) = (Merge(A,B), Odd-Even(A,B)} Example: Suppose the set of elements S = {2,3,6,1O,15,4,5,8} and we start with A = {2,6,IO,15} and B = {3,4,5,8}, two sorted sets of elements. Then #include <iostream> #include <vector> #include <omp.h> using namespace std; void merge(vector<int>& arr, int l, int m, int r) { int i, j, k; int n1 = m - l + 1; int n2 = r - m; vector<int> L(n1), R(n2); for (i = 0; i < n1; i++) { L[i] = arr[l + i]; } for (j = 0; j < n2; j++) { R[j] = arr[m + 1 + j]; } i = 0; j = 0; k = l; while (i < n1 && j < n2) { if (L[i] <= R[j]) { arr[k++] = L[i++]; } else { arr[k++] = R[j++]; } } 40 void merge_sort(vector<int>& arr, int l, int r) { if (l < r) { int m = l + (r - l) / 2; #pragma omp task merge_sort(arr, l, m); #pragma omp task merge_sort(arr, m + 1, r); merge(arr, l, m, r); } } void parallel_merge_sort(vector<int>& arr) { #pragma omp parallel { #pragma omp single merge_sort(arr, 0, arr.size() - 1); } } 41 int main() { vector<int> arr = {5, 2, 9, 1, 7, 6, 8, 3, 4}; double start, end; // Measure performance of sequential merge sort start = omp_get_wtime(); merge_sort(arr, 0, arr.size() - 1); end = omp_get_wtime(); cout << "Sequential merge sort time: " << end - start << endl; // Measure performance of parallel merge sort arr = {5, 2, 9, 1, 7, 6, 8, 3, 4}; start = omp_get_wtime(); parallel_merge_sort(arr); end = omp_get 42 Assignment 3 : Implement Min, Max, Sum and Average operations using Parallel Reduction 43 #include <iostream> #include <vector> #include <omp.h> using namespace std; void min_reduction(vector<int>& arr) { int min_value = INT_MAX; #pragma omp parallel for reduction(min: min_value) for (int i = 0; i < arr.size(); i++) { if (arr[i] < min_value) { min_value = arr[i]; } } cout << "Minimum value: " << min_value << endl; } void max_reduction(vector<int>& arr) { int max_value = INT_MIN; #pragma omp parallel for reduction(max: max_value) for (int i = 0; i < arr.size(); i++) { if (arr[i] > max_value) { max_value = arr[i]; } } cout << "Maximum value: " << max_value << endl; } 44 void sum_reduction(vector<int>& arr) { int sum = 0; #pragma omp parallel for reduction(+: sum) for (int i = 0; i < arr.size(); i++) { sum += arr[i]; } cout << "Sum: " << sum << endl; } void average_reduction(vector<int>& arr) { int sum = 0; #pragma omp parallel for reduction(+: sum) for (int i = 0; i < arr.size(); i++) { sum += arr[i]; } cout << "Average: " << (double)sum / arr.size() << endl; } int main() { vector<int> arr = {5, 2, 9, 1, 7, 6, 8, 3, 4}; min_reduction(arr); max_reduction(arr); sum_reduction(arr); average_reduction(arr); } 45 1. The min_reduction function finds the minimum value in the input array using the #pragma omp parallel for reduction(min: min_value) directive, which creates a parallel region and divides the loop iterations among the available threads. Each thread performs the comparison operation in parallel and updates the min_value variable if a smaller value is found. 1. Similarly, the max_reduction function finds the maximum value in the array, sum_reduction function finds the sum of the elements of array and average_reduction function finds the average of the elements of array by dividing the sum by the size of the array. 1. The reduction clause is used to combine the results of multiple threads into a single value, which is then returned by the function. The min and max operators are used for the min_reduction and max_reduction functions, respectively, and the + operator is used for the sum_reduction and average_reduction functions. In the main function, it creates a vector and calls the functions min_reduction, max_reduction, sum_reduction, and average_reduction to compute the values of min, max, sum and average respectively. 46 Assignment 4 : Write a CUDA Program for : 1. Addition of two large vectors 2. Matrix Multiplication using CUDA C 47 Vector Addition 48 #include <iostream> #include <cuda_runtime.h> __global__ void addVectors(int* A, int* B, int* C, int n) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) { C[i] = A[i] + B[i]; } } int main() { int n = 1000000; int* A, * B, * C; int size = n * sizeof(int); // Allocate memory on the host cudaMallocHost(&A, size); cudaMallocHost(&B, size); cudaMallocHost(&C, size); // Initialize the vectors for (int i = 0; i < n; i++) { A[i] = i; B[i] = i * 2; } 49 // Allocate memory on the device int* dev_A, * dev_B, * dev_C; cudaMalloc(&dev_A, size); cudaMalloc(&dev_B, size); cudaMalloc(&dev_C, size); // Copy data from host to device cudaMemcpy(dev_A, A, size, cudaMemcpyHostToDevice); cudaMemcpy(dev_B, B, size, cudaMemcpyHostToDevice); // Launch the kernel int blockSize = 256; int numBlocks = (n + blockSize - 1) / blockSize; 50 // Copy data from device to host cudaMemcpy(C, dev_C, size, cudaMemcpyDeviceToHost); // Print the results for (int i = 0; i < 10; i++) { cout << C[i] << " "; } cout << endl; // Free memory cudaFree(dev_A); cudaFree(dev_B); cudaFree(dev_C); cudaFreeHost(A); cudaFreeHost(B); cudaFreeHost(C); return 0; 51 1. In this program, the `addVectors` kernel takes in the two input vectors `A` and `B`, the output vector `C`, and the size of the vectors `n`. The kernel uses the `blockIdx.x` and `threadIdx.x` variables to calculate the index `i` of the current thread. If the index is less than `n`, the kernel performs the addition operation `C[i] = A[i] + B[i]`. 1. In the `main` function, the program first allocates memory for the input and output vectors on the host and initializes them. Then it allocates memory for the vectors on the device and copies the data from the host to the device using `cudaMemcpy`. 52 Matrix multiplication 53 Matrix-Matrix Multiplication ● Consider two n x n matrices A and B partitioned into p blocks Ai,j and Bi,j ≤ i, j < ) of size each. ● Process Pi,j initially stores Ai,j and Bi,j and computes block Ci,j of the result matrix. ● Computing submatrix Ci,j requires all submatrices Ai,k and Bk,j for 0 ≤ k < ● All-to-all broadcast blocks of A along rows and B along columns. ● Perform local submatrix multiplication. (0 . Example A = 2 1 5 3 6 1 2 3 0 7 1 6 4 5 6 5 9 2 4 4 1 9 8 -8 3 6 7 2 4 0 -8 5 B = 55 Example A = 2 1 5 3 6 1 2 3 0 7 1 6 4 5 6 5 9 2 4 4 1 9 8 -8 3 6 7 2 4 0 -8 5 B = 56 Metrics divided into 4 squares 2 1 5 3 6 1 2 3 0 7 1 6 4 5 6 5 9 5 4 4 1 9 8 -8 2 3 7 2 4 0 -8 5 57 Metrics alignment 2 1 5 3 6 1 2 3 0 7 1 6 4 5 6 5 9 5 4 4 1 9 8 -8 2 3 7 2 4 0 -8 5 58 C0,0 C0,1 C1,0 C1,1 = 2 1 0 7 5 3 1 6 1 4 5 8 -8 6 -8 5 4 4 1 9 7 2 4 0 9 2 2 3 5 3 6 5 = X = = X X X = = = = 16 7 28 35 16 -25 -40 22 20 36 15 63 30 37 42 39 59 Now next step shifting operation is performed on A and B , Shift A one step left and B one step Up 5 3 2 1 1 9 2 3 1 6 0 7 4 0 6 5 4 4 4 4 6 1 8 -8 7 2 7 2 4 5 -8 5 60 After performing local matrix multiplication C0,0 = C0,1 = C1,0 C1,1 = 16 7 28 35 16 -25 -40 22 20 36 15 63 30 37 42 39 = X 17 45 25 9 10 11 42 35 62 19 42 33 0 -12 40 -46 X X X = = = = 33 52 53 44 26 -14 2 57 82 55 57 96 30 25 82 -7 61 #include <cuda_runtime.h> #include <iostream> __global__ void matmul(int* A, int* B, int* C, int N) { int Row = blockIdx.y*blockDim.y+threadIdx.y; int Col = blockIdx.x*blockDim.x+threadIdx.x; if (Row < N && Col < N) { int Pvalue = 0; for (int k = 0; k < N; k++) { Pvalue += A[Row*N+k] * B[k*N+Col]; } C[Row*N+Col] = Pvalue; } } int main() { int N = 512; int size = N * N * sizeof(int); int* A, * B, * C; int* dev_A, * dev_B, * dev_C; cudaMallocHost(&A, size); cudaMallocHost(&B, size); cudaMallocHost(&C, size); 62 cudaMalloc(&dev_A, size); cudaMalloc(&dev_B, size); cudaMalloc(&dev_C, size); // Initialize matrices A and B for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { A[i*N+j] = i*N+j; B[i*N+j] = j*N+i; } } cudaMemcpy(dev_A, A, size, cudaMemcpyHostToDevice); cudaMemcpy(dev_B, B, size, cudaMemcpyHostToDevice); dim3 dimBlock(16, 16); dim3 dimGrid(N/dimBlock.x, N/dimBlock.y); matmul<<<dimGrid, dimBlock>>>(dev_A, dev_B, dev_C, N); cudaMemcpy(C, dev_C 63 // Print the result for (int i = 0; i < 10; i++) { for (int j = 0; j < 10; j++) { std::cout << C[i*N+j] << " "; } std::cout << std::endl; } // Free memory cudaFree(dev_A); cudaFree(dev_B); cudaFree(dev_C); cudaFreeHost(A); cudaFreeHost(B); cudaFreeHost(C); return 0; 64 • In this program, the `matmul` kernel takes in the two input matrices `A` and `B`, the output matrix `C`, and the size of the matrices `N`. The kernel uses the `blockIdx.x`, `blockIdx.y`, `threadIdx.x`, and `threadIdx.y` variables to calculate the indices of the current thread. If the indices are less than `N`, the kernel performs the matrix multiplication operation `Pvalue += A[Row*N+k] * B[k*N+Col]` and store the Pvalue in `C[Row*N+Col]`. • In the `main` function, the program first allocates memory for the input and output matrices on the host and initializes them. Then it allocates memory for the matrices on the device and copies the data from the host to the device using `cudaMemcpy`. • Next, the program launches the kernel with the appropriate grid and block dimensions. The kernel uses a 2D grid of thread blocks to perform the matrix multiplication in parallel. • Finally, it copies the data from device to host using cudaMemcpy and prints the result using nested for loop. And it also frees the memory used. 65 Group 2 66 Measuring the performance enhancement of a parallel Quicksort algorithm using MPI 67 1. Compare the execution time of the parallel algorithm with that of the sequential algorithm. 1. Divide the input array into equal-sized partitions, and have each process sort its partition using the standard Quicksort algorithm. 1. The processes then exchange their partitions with each other, and use a parallel merge operation to merge the sorted partitions into a single, sorted array. 2. To evaluate the performance enhancement, run the parallel Quicksort algorithm with different number of processes and compare the execution time with the execution time of the sequential Quicksort algorithm. 1. You can also vary the size of the input array and observe how the execution time changes. 1. Additionally, you can measure the speedup and efficiency of the parallel algorithm by using the following formulas: Speedup = T_seq / T_par Efficiency = Speedup / P Where: T_seq = execution time of the sequential algorithm T_par = execution time of the parallel algorithm P = number of processes used 68 Parallel Quicksort 1. Lets start with recursive decomposition - the list is partitioned serially and each of the subproblems is handled by a different processor. 2. The time for this algorithm is lower-bounded by Ω(n)! 3. Can we parallelize the partitioning step - in particular, if we can use n processors to partition a list of length n around a pivot in O(1) time, we have a winner. 4. This is difficult to do on real machines, though. PRAM Formulation 1. We assume a CRCW (concurrent read, concurrent write) PRAM with concurrent writes resulting in an arbitrary write succeeding. 2. The formulation works by creating pools of processors. Every processor is assigned to the same pool initially and has one element. 3. Each processor attempts to write its element to a common location (for the pool). 4. Each processor tries to read back the location. If the value read back is greater than the processor's value, it assigns itself to the `left' pool, else, it assigns itself to the `right' pool. 5. Each pool performs this operation recursively. 6. Note that the algorithm generates a tree of pivots. The depth of the tree is the expected parallel runtime. The average value is O(log n). PRAM Formulation Parallelizing Quicksort: Shared Address Space Formulation 1. The only thing we have not described is the global reorganization (merging) of local lists to form L and U. 2. The problem is one of determining the right location for each element in the merged list. 3. Each processor computes the number of elements locally less than and greater than pivot. 4. It computes two sum-scans to determine the starting location for its elements in the merged L and U lists. 5. Once it knows the starting locations, it can write its elements safely. Parallelizing Quicksort: Shared Address Space Formulation Parallelizing Quicksort: Shared Address Space Formulation 1. The parallel time depends on the split and merge time, and the quality of the pivot. 2. The latter is an issue independent of parallelism, so we focus on the first aspect, assuming ideal pivot selection. 3. The algorithm executes in four steps: (i) determine and broadcast the pivot; (ii) locally rearrange the array assigned to each process; (iii) determine the locations in the globally rearranged array that the local elements will go to; and (iv) perform the global rearrangement. 4. The first step takes time Θ(log p), the second, Θ(n/p) , the third, Θ(log p) , and the fourth, Θ(n/p). 5. The overall complexity of splitting an n-element array is Θ(n/p) + Θ(log p). #include <iostream> #include <algorithm> #include <mpi.h> using namespace std; // Function to partition the array int partition(int arr[], int low, int high) { int pivot = arr[high]; int i = (low - 1); for (int j = low; j <= high- 1; j++) { if (arr[j] <= pivot) { i++; swap(arr[i], arr[j]); } } swap(arr[i + 1], arr[high]); return (i + 1); } // Function to perform quicksort on the partition void quicksort(int arr[], int low, int high) { if (low < high) { int pivot = partition(arr, low, high); quicksort(arr, low, pivot - 1); quicksort(arr, pivot + 1, high); } } 75 int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); int n = 100; int* arr = new int[n]; int* recvbuf = new int[n]; int* sendbuf = new int[n]; // Fill the array with random values if (rank == 0) { for (int i = 0; i < n; i++) { arr[i] = rand() % 100; } } // Divide the array into equal-sized partitions for each process int sub_arr_size = n / size; int* sub_arr = new int[sub_arr_size]; MPI_Scatter(arr, sub_arr_size, MPI_INT, sub_arr, sub_arr_size, MPI_INT, 0, MPI_COMM_WORLD); // Sort the partition using quicksort quicksort(sub_arr, 0, sub_arr_size - 1); 76 // Gather the sorted partitions from each process MPI_Gather(sub_arr, sub_arr_size, MPI_INT, recvbuf, sub_arr_size, MPI_INT, 0, MPI_COMM_WORLD); // Print the sorted array for (int i = 0; i < n; i++) { cout << recvbuf[i] << " "; } cout << endl; // Measure the execution time of the parallel quicksort algorithm double start_time = MPI_Wtime(); // Perform the above steps again double end_time = MPI_Wtime(); double parallel_execution_time = end_time - start_time; // Measure the execution time of the sequential quicksort algorithm start_time = MPI_Wtime(); quicksort(arr, 0, n - 1); end_time = MPI_Wtime(); double sequential_execution_time = end_time - start_time; 77 // Calculate speedup and efficiency double speedup = sequential_execution_time / parallel_execution_time; double efficiency = speedup / size; cout << "Sequential execution time: " << sequential_execution_time << endl; cout << "Parallel execution time: " << parallel_execution_time << endl; cout << "Speedup: " << speedup << endl; cout << "Efficiency: " << efficiency << endl; } MPI_Finalize(); return 0; 78 Implement Huffman Encoding on GPU 79 #include <iostream> #include <cuda_runtime.h> __global__ void buildHuffmanTree(int* frequencies, int* tree, int n) { int i = threadIdx.x + blockIdx.x * blockDim.x; if (i < n) { // Find the two lowest frequency nodes int min1 = INT_MAX, min2 = INT_MAX; int minIndex1, minIndex2; for (int j = 0; j < n; j++) { if (frequencies[j] != 0 && frequencies[j] < min1) { min2 = min1; minIndex2 = minIndex1; min1 = frequencies[j]; minIndex1 = j; } else if (frequencies[j] != 0 && frequencies[j] < min2) { min2 = frequencies[j]; minIndex2 = j; } } 80 // Combine the two lowest frequency nodes into a new node int newNodeIndex = n + i; frequencies[newNodeIndex] = min1 + min2; tree[newNodeIndex] = minIndex1; tree[newNodeIndex + n] = minIndex2; } } int main() { int n = 256; int* frequencies; int* tree; cudaMalloc(&frequencies, n * sizeof(int)); cudaMalloc(&tree, 2 * n * sizeof(int)); // Initialize frequencies // ... int numBlocks = (n + 256 - 1) / 256; buildHuffmanTree<<<numBlocks, 256>>>(frequencies, tree, n); // Encode the data using the Huffman tree // ... cudaFree(frequencies); cudaFree(tree); 81 1. The program defines a kernel function buildHuffmanTree that takes in three parameters: an array of frequencies, an array of the Huffman tree, and the number of symbols. The kernel function is executed on the GPU, with each thread working on building a part of the tree. 2. The kernel function first finds the two lowest frequency nodes in the array of frequencies, and combines them into a new node. The new node is added to the array of the Huffman tree, and its frequency is the sum of the frequencies of the two lowest frequency nodes. 3. In the main function, the program allocates memory on the GPU for the arrays of frequencies and the Huffman tree. Then it initializes the array of frequencies (this part of the code is not shown in the example). Finally, it launches the buildHuffmanTree kernel function on the GPU by specifying the number of blocks and threads per block, and passing in the necessary parameters. After the kernel function finishes executing, the program frees the memory allocated on the GPU. SINHGAD TECHNICAL EDUCATION SOCIETY’S RMD SINHGAD SCHOOL OF ENGINEERING WARJE, PUNE - 411058 Department of Computer Engineering Laboratory Manual of Laboratory Practice VI (BE 2019 Pattern) SINHGAD TECHNICAL EDUCATION SOCIETY’S RMD SINHGAD SCHOOL OF ENGINEERING Warje, Pune 411058 Department of Computer Engineering LABORATORY MANUAL AY: 2022-23 IMAGE PROCESSING LABORATORY BE Computer Engineering Semester –II Subject Code – 410252 (B) TEACHING SCHEME Practical: 2 Hrs / Week CREDIT 01 EXAMINATION TW: 50 Marks Prepared By: Mr. Saurabh Vijay Parhad (Assistant Professor, Department of Computer Engineering) Vision “We are committed to produce not only goodengineers but good human beings, also.” Mission “Holistic development of students and teachers is what we believe in and work for. We strive to achieve this by imbibing a unique value system, transparent work culture, excellent academic and physical environment conducive to learning, creativity and technology transfer. Our mandate is to generate, preserve and share knowledge for developing a vibrant society.” Programme Outcomes (POs) PO1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and an engineering specialization to the solution of complex engineering problems. PO2. Problem analysis: Identify, formulate, research literature, and analyze complex engineering problemsreaching substantiated conclusions using first principles of mathematics, natural sciences, and engineering sciences. PO3. Design/development of solutions: Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and environmental considerations. PO4. Conduct investigations of complex problems: Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions. PO5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering and IT tools including prediction and modelling to complex engineering activities with an understanding of the limitations. PO6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the professionalengineering practice. PO7. Environment and sustainability: Understand the impact of the professional engineering solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable development. PO8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the engineering practice. PO9. Individual and team work: Function effectively as an individual, and as a member or leader in diverse teams, and in multidisciplinary settings. PO10. Communication: Communicate effectively on complex engineering activities with the engineering community and with society at large, such as, being able to comprehend and write effective reports and design documentation, make effective presentations, and give and receive clear instructions. PO11. Project management and finance: Demonstrate knowledge and understanding of the engineeringand management principles and apply these to one’s own work, as a member and leader in a team, to manage projects and in multidisciplinary environments. PO12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in independent and life-long learning in the broadest context of technological change. Programme Specific Outcomes (PSOs) Computer Engineering graduate will be able to, PSO1: Project Development: Successfully complete hardware and/or software related system or applicationprojects, using the phases of project development life cycle to meet the requirements of service and product industries; government projects; and automate other engineering stream projects. PSO2: Domain Expertise: Demonstrate effective application of knowledge gained from different computer domains like, data structures, data bases, operating systems, computer networks, security, parallel programming, in project development, research and higher education. PSO3: Career Development: Achieve successful Career and Entrepreneurship- The ability to employ modern computer languages, environments, and platforms in creating innovative career paths to be an entrepreneur, and a zest for higher studies. INDEX Sr. No. Title of Assignment Page No. 1 Read any image. Display the histogram, Equalized histogram, and image with equalized histogram 2 2 Read any image. Display the outputs of contrast stretching, intensity level slicing 8 3 Compare the results of any three edge detection algorithms on the same image dataset and do the analysis of the result. 14 5 Compare the result of any two image segmentation algorithm on the same image data set Image compression using any three compression techniques and compare the results. 6 Implement image segmentation to detect object in the background of image. 4 20 25 32 B.E. Computer Image Processing Experiment: 1 Assignment No: 1 Title Read any image Display the histogram, Equalized histogram, and image withequalized histogram Roll_no Class B.E. Subject Image Processing Sign Department of Computer Engineering, RMDSSOE, Warje Page 2 B.E. Computer Image Processing ASSIGNMENT:1 TITLE: Display the histogram, Equalized histogram, and image with equalizedhistogram OBJECTIVES: To understand the concepts of Histogram and Histogram Equalization THEORY-CONCEPT: Histogram Histogram is a graphical representation of the intensity distribution of an image. In simple terms, it represents the number of pixels for each intensity value considered. In the above figure, X-axis represents the tonal scale (black at the left and white at the right), and Y-axis represents the number of pixels in an image. Here, the histogram shows the number of pixels for each brightness level (from black to white), and when there are more pixels, the peak at the certain brightness level is higher. Histogram Equalization Histogram equalization is used to enhance contrast. It is not necessary that contrast will always be increase in this. Department of Computer Engineering, RMDSSOE, Warje Page 3 B.E. Computer Image Processing There may be some cases were histogram equalization can be worse. In those cases the contrast is decreased. Histogram Equalization is a computer image processing technique used to improve contrast in images. It accomplishes this by effectively spreading out the most frequent intensityvalues, i.e. stretching out the intensity range of the image. This method usually increases the global contrast of images when its usable data is represented by close contrast values. This allows for areas of lower local contrast to gain a higher contrast. A color histogram of an image represents the number of pixels in each type of color component. Histogram equalization cannot be applied separately to the Red, Green and Blue components of the image as it leads to dramatic changes in the image’s color balance. However, if the image is first converted to another color space, like HSL/HSV color space, then the algorithm can be applied to the luminance or valuechannel without resulting in changes to the hue and saturation of the image. Adaptive Histogram Equalization Adaptive Histogram Equalization differs from ordinary histogram equalization in the respect that the adaptive method computes several histograms, each corresponding to a distinct section of the image, and uses them to redistribute the lightness values of the image. It is therefore suitable for improving the local Department of Computer Engineering, RMDSSOE, Warje Page 4 B.E. Computer Image Processing contrast and enhancing the definitions of edges in each region of an image. Contrastive Limited Adaptive Equalization Contrast Limited AHE (CLAHE) differs from adaptive histogram equalization in its contrast limiting. In the case of CLAHE, the contrast limiting procedure is applied to each neighborhood from which a transformation function is derived. CLAHE was developed to prevent the over amplification of noise that adaptive histogram equalization can give rise to. Department of Computer Engineering, RMDSSOE, Warje Page 5 B.E. Computer Image Processing Steps Involved • Get the input image • Generate the histogram for the image • Find the local minima of the image • Divide the histogram based on the local minima • Have the specific gray levels for each partition of the histogram • Apply the histogram equalization on each partition Algorithm Compute the histogram of pixel values of the input image. The histogram placesthe value of each pixel 𝑓[𝑓,𝑓] into one of L uniformly-spaced buckets ℎ[𝑓] Where 𝑓=2^8 and the image dimension is 𝑓×𝑓 Department of Computer Engineering, RMDSSOE, Warje Page 6 B.E. Computer Image Processing Calculate the cumulative distribution function Scale the input image using the cumulative distribution function to produce the output image. Where CDFmin is the smallest non-zero value of the cumulative distribution function CLAHE (Contrast Limited Adaptive Histogram Equalization) The above histogram equalization considers the global contrast of the image, and in many cases, it is not a good idea. Even though the contrast of the image had been improved, we lost some of the information due to over brightness. The reasonis that the histogram is not confined to the local region. Adaptive Histogram Equalization helps to solve this issue. In this method, the image is divided into small blocks, and each of these blocks is histogram equalized. The same image has been converted, and below is the output of Adaptive Histogram Equalization. CONCLUSION/ANALYSIS: We have studied the concepts of the histogram, Equalized histogram, and imagewith equalized histogram Department of Computer Engineering, RMDSSOE, Warje Page 7 B.E. Computer Image Processing Experiment: 2 Assignment No: Title 2 Display the outputs of contrast stretching, intensity level slicing Roll_no Class B.E. Subject Image Processing Sign Department of Computer Engineering, RMDSSOE, Warje Page 8 B.E. Computer Image Processing ASSIGNMENT :2 TITLE: Display the outputs of contrast stretching, intensity level slicing. OBJECTIVES: 1. Understand the concept of Contrast Stretching of an image. 2. Understand the concept of Intensity level Slicing of an image THEORY-CONCEPT Contrast Stretching Contrast stretching is an Image Enhancement method which attempts to improve an image by stretching the range of intensity values. Here, we stretch the minimum and maximum intensity values present to the possible minimum and maximum intensity values. Example: If the minimum intensity value(r min ) present in the image is 100 then it is stretched to the possible minimum intensity value 0. Likewise, if the maximum intensity value(r max) is less than the possible maximum intensity value 255 then it is stretched out to 255.(0–255 is taken as standard minimum and maximum intensity values for 8-bit images) Note: Contrast stretching is only possible if minimum intensity value and maximum intensity value are not equal to the possible minimum and maximum intensity values. Otherwise, the image generated after contrast stretching will be the same as input image. General Formula for Contrast Stretching: Department of Computer Engineering, RMDSSOE, Warje Page 9 B.E. Computer Image Processing For I min = 0 and I max = 255 (for standard 8-bit grayscale image) where, r = current pixel intensity value r min = minimum intensity value present in the whole image r max = maximum intensity value present in the whole image Note: Output intensity value s should be rounded up to nearest integer value. Output Example: Gray level slicing: Gray level slicing is a technique used in image processing to extract a set of pixels from an image based on their gray level values. This technique can be used to select a specific range of gray levels from an image, or to select all pixels with a gray level value above or below a certain threshold. Gray level slicing is commonly used in image processing tasks such as image thresholding, where a specific range of gray levels is selected from an image and all other Department of Computer Engineering, RMDSSOE, Warje Page 10 B.E. Computer Image Processing pixels are set to black or white. This technique can also be used to extract features from an image, or to remove background noise. In a photograph, a gray level ranges from gray to black to red. Slicing is a type of filtering that takes band passes to a different level. Rest or maintaining the intensity level at the same level is used for controlling the effect. Bit plane slicing is an approach that generates binary images by converting gray-level images into binary. Every grey level’s map is made up of high-quality maps calculated from a variety of factors. A bit plane is a position in digital discrete signals such as images or sounds in binary numbers. Photos with log transformations show darker patches of the image. Identity transformation is the process of changing one’s identity. Department of Computer Engineering, RMDSSOE, Warje Page 11 B.E. Computer Image Processing Gray level slicing is a technique used in image processing to create a new image from an existing image. The new image is created by thresholding the original image, which means that only pixels that meet a certain criteria is included in the new image. Gray level slicing is often used to highlight certain features in an image, or to remove background noise. What Is Gray Level In Image Processing The image processing techniques were designed to transform gray matter at the pixels level, regardless of whether the image is black or white. In a histogram, the horizontal axis is from 0 to 255, and the vertical axis is determined by the number of pixels in the image. The gray level image is made up of 256 levels of gray and is made up of a histogram. Department of Computer Engineering, RMDSSOE, Warje Page 12 B.E. Computer Image Processing Gray depth varies by less than a tenth of a degree in a grey level resolution. Layers that go gray in the grid, also known as band pass filtering, remove layers that go gray in the grid. By decreasing the intensity of multiple images or leaving them alone, it can manipulate the intensity of all images. Because the k bit is measured by the number of pixels per pixel, the grey scale for an image is the same as the number of pixels per bit. A grey sphere is made up of 256 squares that are all in the shape of a k bit. For images with 8 bits of depth, compression is set to 255 bits per second. Intensity Level Slicing In Image Processing A set of specific intensities is highlighted by intensity level slicing in an image. As a result, each gray level region is distinguished from the rest of the image. When it comes to an image, the value of your region of interest should always be between 80 and 150. Intensity Transformations In Images The use of intensity transforms is possible in many ways. There are numerous types of notes. When compared to an original image, contrast stretching increases the intensity of pixels that are brighter than the original. The intensity of the pixels in an image with contrast reduction is reduced by 50%. A brightening improves the brightness of the original image. The inverse of the color is that the pixels in the original image darker. CONCLUSION/ANALYSIS: We have studied the concepts of contrast stretching, intensity level slicing. Department of Computer Engineering, RMDSSOE, Warje Page 13 B.E. Computer Image Processing Experiment: 3 Assignment No: Title 3 Study any three edge detection algorithms Roll_no Class B.E. Subject Image Processing Sign Department of Computer Engineering, RMDSSOE, Warje Page 14 B.E. Computer Image Processing ASSIGNMENT:3 TITLE: Compare any three Edge Detection Algorithms OBJECTIVES: To Study and understand different Edge detection algorithms. THEORY: Edges are significant local changes of intensity in a digital image. An edge can be defined as a set of connected pixels that forms a boundary between two disjoint regions. There are three types of edges: • Horizontal edges • Vertical edges • Diagonal edges Edge Detection is a method of segmenting an image into regions of discontinuity. It is a widely used technique in digital image processing like • pattern recognition • image morphology • feature extraction Edge detection allows users to observe the features of an image for a significant change in the gray level. This texture indicating the end of one region in the image and the beginning of another. It reduces the amount of data in an image and preserves the structural properties of an image. Edge Detection Operators are of two types: Department of Computer Engineering, RMDSSOE, Warje Page 15 B.E. Computer • Image Processing Gradient – based operator which computes first-order derivations in a digital image like, Sobel operator, Prewitt operator, Robert operator • Gaussian – based operator which computes second-order derivations in a digital image like, Canny edge detector, Laplacian of Gaussian Sobel Operator: It is a discrete differentiation operator. It computes the gradient approximation of image intensity function for image edge detection. At the pixels of an image, the Sobel operator produces either the normal to a vector or the corresponding gradient vector. It uses two 3 x 3 kernels or masks which are convolved with the input image to calculate the vertical and horizontal derivative approximations respectively – Advantages: 1. Simple and time efficient computation 2. Very easy at searching for smooth edges Department of Computer Engineering, RMDSSOE, Warje Page 16 B.E. Computer Image Processing Limitations: 1. Diagonal direction points are not preserved always 2. Highly sensitive to noise 3. Not very accurate in edge detection 4. Detect with thick and rough edges does not give appropriate results Prewitt Operator: This operator is almost similar to the sobel operator. It also detects vertical and horizontal edges of an image. It is one of the best ways to detect the orientation and magnitude of an image. It uses the kernels or masks Advantages: 1. Good performance on detecting vertical and horizontal edges 2. Best operator to detect the orientation of an image Limitations: 1. The magnitude of coefficient is fixed and cannot be changed 2. Diagonal direction points are not preserved always Robert Operator: This gradient-based operator computes the sum of squares of the differences between diagonally adjacent pixels in an image through discrete differentiation. Then the gradient approximation is made. It uses the following 2 x 2 kernels or masks – Advantages: 1. Detection of edges and orientation are very easy Department of Computer Engineering, RMDSSOE, Warje Page 17 B.E. Computer Image Processing 2. Diagonal direction points are preserved Limitations: 1. Very sensitive to noise 2. Not very accurate in edge detection Canny Operator: It is a gaussian-based operator in detecting edges. This operator is not susceptible to noise. It extracts image features without affecting or altering the feature. Canny edge detector have advanced algorithm derived from the previous work of Laplacian of Gaussian operator. It is widely used an optimal edge detection technique. It detects edges based on three criteria: 1. Low error rate 2. Edge points must be accurately localized 3. There should be just one single edge response Advantages: 1. It has good localization 2. It extract image features without altering the features 3. Less Sensitive to noise Limitations: 1. There is false zero crossing 2. Complex computation and time consuming Some Real-world Applications of Image Edge Detection: • medical imaging, study of anatomical structure • locate an object in satellite images • automatic traffic controlling systems • face recognition, and fingerprint recognition Department of Computer Engineering, RMDSSOE, Warje Page 18 B.E. Computer Image Processing CONCLUSION/ANALYSIS: We have studied different types of edge detection algorithm comparison in image processing and their real world applications. Department of Computer Engineering, RMDSSOE, Warje Page 19 B.E. Computer Image Processing Experiment: 4 Assignment No: Title 4 Study and Understand image segmentation algorithm Roll_no Class B.E. Subject Image Processing Sign Department of Computer Engineering, RMDSSOE, Warje Page 20 B.E. Computer Image Processing ASSIGNMENT:4 TITLE: Image Segmentation Techniques. OBJECTIVES: To study and understand Image Segmentation Techniques. Theory: What Is Image Segmentation? Image segmentation is a method of dividing a digital image into subgroups called image segments, reducing the complexity of the image and enabling further processing or analysis of each image segment. Technically, segmentation is the assignment of labels to pixels to identify objects, people, or other important elements in the image. A common use of image segmentation is in object detection. Instead of processing the entire image, a common practice is to first use an image segmentation algorithm to find objects of interest in the image. Then, the object detector can operate on a bounding box already defined by the segmentation algorithm. This prevents the detector from processing the entire image, improving accuracy and reducing inference time. Image segmentation is a key building block of computer vision technologies and algorithms. It is used for many practical applications including medical image analysis, computer vision for autonomous vehicles, face recognition and detection, video surveillance, and satellite image analysis. How Does Image Segmentation Work? Image segmentation is a function that takes image inputs and produces an output. The output is a mask or a matrix with various elements specifying the object class or instance to which each pixel belongs. Several relevant heuristics, or high-level image features, can be useful for image Department of Computer Engineering, RMDSSOE, Warje Page 21 B.E. Computer Image Processing segmentation. These features are the basis for standard image segmentation algorithms that use clustering techniques like edges and histograms. There are various neural network designs and implementations suitable for image segmentation. They usually contain the same basic components: • An encoder—a series of layers that extract image features using progressively deeper, narrower filters. The encoder might be pre-trained on a similar task (e.g., image recognition), allowing it to leverage its existing knowledge to perform segmentation tasks. • A decoder—a series of layers that gradually convert the encoder’s output into a segmentation mask corresponding with the input image’s pixel resolution. • Skip connections—multiple long-range neural network connections allowing the model to identify features at different scales to enhance model accuracy. Image Segmentation Techniques: 1. Edge-Based Segmentation Edge-based segmentation is a popular image processing technique that identifies the edges of various objects in a given image. It helps locate features of associated objects in the image using the information from the edges. Edge detection helps strip images of redundant information, reducing their size and facilitating analysis. Department of Computer Engineering, RMDSSOE, Warje Page 22 B.E. Computer Image Processing Edge-based segmentation algorithms identify edges based on contrast, texture, color, and saturation variations. They can accurately represent the borders of objects in an image using edge chains comprising the individual edges. 2. Threshold-Based Segmentation Thresholding is the simplest image segmentation method, dividing pixels based on their intensity relative to a given value or threshold. It is suitable for segmenting objects with higher intensity than other objects or backgrounds. The threshold value T can work as a constant in low-noise images. In some cases, it is possible to use dynamic thresholds. Thresholding divides a grayscale image into two segments based on their relationship to T, producing a binary image. 3. Region-Based Segmentation Region-based segmentation involves dividing an image into regions with similar characteristics. Each region is a group of pixels, which the algorithm locates via a seed point. Once the algorithm finds the seed points, it can grow regions by adding more pixels or shrinking and merging them with other points. 4. Cluster-Based Segmentation Clustering algorithms are unsupervised classification algorithms that help identify hidden information in images. They augment human vision by isolating clusters, shadings, and structures. The algorithm divides images into clusters of pixels with Department of Computer Engineering, RMDSSOE, Warje Page 23 B.E. Computer Image Processing similar characteristics, separating data elements and grouping similar elements into clusters. 5. Watershed Segmentation Watersheds are transformations in a grayscale image. Watershed segmentation algorithms treat images like topographic maps, with pixel brightness determining elevation (height). This technique detects lines forming ridges and basins, marking the areas between the watershed lines. It divides images into multiple regions based on pixel height, grouping pixels with the same gray value. The watershed technique has several important use cases, including medical image processing. For example, it can help identify differences between lighter and darker regions in an MRI scan, potentially assisting with diagnosis. Conclusion: Successfully detected the object in the background of image using Image Segmentation techniques. Department of Computer Engineering, RMDSSOE, Warje Page 24 B.E. Computer Image Processing Experiment: 5 Assignment No: Title 5 Image compression Techniques. Roll_no Class B.E. Subject Image Processing Sign Department of Computer Engineering, RMDSSOE, Warje Page 25 B.E. Computer Image Processing ASSIGNMENT:5 TITLE: Image Compression Techniques. OBJECTIVES: To study and understand Image Compression Techniques. THEORY: IMAGE COMPRESSION: Image compression addresses the problem of reducing the amount of data required to represent a digital image. It is a process intended to yield a compact representation of an image, thereby reducing the image storage/transmission requirements. Compression is achieved by the removal of one or more of the three basic data redundancies: 1. Coding Redundancy 2. Interpixel Redundancy 3. Psychovisual Redundancy Coding redundancy is present when less than optimal code words are used. Interpixel redundancy results from correlations between the pixels of an image. Psychovisual redundancy is due to data that is ignored by the human visual system (i.e. visually non essential information). Image compression techniques reduce the number of bits required to represent an image by taking advantage of these redundancies. An inverse process called decompression (decoding) is applied to the compressed data to get the reconstructed image. The objective of compression is to reduce the number of bits as much as possible, while keeping the resolution and the visual quality of the reconstructed image as close to the original image as possible. Image compression systems are composed of two distinct structural blocks: an encoder and a decoder. Department of Computer Engineering, RMDSSOE, Warje Page 26 B.E. Computer Image Processing STEPS INVOLVED IN COMPRESSING IMAGE: Input image Output image Encoder Source encoder Channel encoder Channel decoder Source decoder Channel Decoder Image f(x,y) is fed into the encoder, which creates a set of symbols form the input data and uses them to represent the image. If we let n1 and n2 denote the number of information carrying units (usually bits) in the original and encoded images respectively, the relative data redundancy RD of the first data set can be defined as RD = 1-1/ CR Where CR. commonly called the compression Ratio CR = n1 /n2 As shown in the figure, the encoder is responsible for reducing the coding, interpixel and psycho visual redundancies of input image. 2. IMAGE COMPRESSION TECHNIQUES The image compression techniques are broadly classified into two categories depending whether or not an exact replica of the original image could be reconstructed using the compressed image. These are: 1. Lossless technique 2. Lossy technique Department of Computer Engineering, RMDSSOE, Warje Page 27 B.E. Computer Image Processing Lossless compression technique In lossless compression techniques, the original image can be perfectly recovered from the compressed (encoded) image. These are also called noiseless since they do not add noise to the signal (image).It is also known as entropy coding since it use statistics/decomposition techniques to eliminate/minimize redundancy. Lossless compression is used only for a few applications with stringent requirements such as medical imaging. Following techniques are included in lossless compression: 1. Run length encoding 2. Huffman encoding 3. LZW coding 4. Area coding Lossy compression technique: Lossy schemes provide much higher compression ratios than lossless schemes. Lossy schemes are widely used since the quality of the reconstructed images is adequate for most applications .By this scheme, the decompressed image is not identical to the original image, but reasonably close to it. Major performance considerations of a lossy compression scheme include: ➢ Compression ratio ➢ Signal - to – noise ratio ➢ Speed of encoding & decoding. Lossy compression techniques includes following schemes: ➢ Lossy predictive coding ➢ Transform coding ➢ Wavelet coding LOSSLESS COMPRESSION TECHNIQUE 1. Run Length Encoding : Run-length encoding (RLE) is a very simple form of data compression in which runs of Department of Computer Engineering, RMDSSOE, Warje Page 28 B.E. Computer Image Processing data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs: for example, relatively simple graphic images such as icons, line drawings, and animations. Let us take a hypothetical single scan line, with B representing a black pixel and W representing white: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWW WWWWWWWWWW WWWBWWWWWWWWWWWWWW If we apply the run-length encoding (RLE) data compression algorithm to the above hypothetical scan line, we get the following: 12W1B12W3B24W1B14W.This is a general technique for coding symbols based on their statistical occurrence frequencies (probabilities). The pixels in the image are treated as symbols. The symbols that occur more frequently are assigned a smaller number of bits, while the symbols that occur less frequently are assigned a relatively larger number of bits. 1 Huffman Coding : Dr. David A. Huffman proposed Huffman Coding in 1952. This method used for the construction of minimum redundancy code. Huffman code is a technique for compressing data. Huffman's greedy algorithm looks at the occurrence of each character and it as a binary string in an optimal way. Huffman coding is a form of statistical coding which attempts to reduce the amount of bits required to represent a string of symbols. The algorithm accomplishes its goals by allowing symbols to vary in length. Shorter codes are assigned to the most frequently used symbols, and longer codes to the symbols which appear less frequently in the string (that's where the statistical part comes in). Code word lengths are no longer fixed like ASCII .Code word lengths vary and will be shorter for the more frequently used characters. Huffman code is a prefix code. This means that the (binary) code of any symbol is not the Department of Computer Engineering, RMDSSOE, Warje Page 29 B.E. Computer Image Processing prefix of the code of any other symbol. Most image coding standards use lossy techniques in the earlier stages of compression and use Huffman coding as the final step. 2 LZW Coding: LZW (Lempel- Ziv – Welch) is a dictionary based coding. Dictionary based coding can be static or dynamic. In static dictionary coding, dictionary is fixed during the encoding and decoding processes. In dynamic dictionary coding, the dictionary is updated on fly. LZW is widely used in computer industry and is implemented as compress command on UNIX. LZW compression has been integrated into a variety of mainstream imaging rule formats, including the graphic interchange format(GIF),tagged image file format(TIFF), and the portable document format(PDF). LZW coding process, a “codebook or dictionary” containing the source symbols to be coded is constructed. For 8-bit monochrome images, the first 256 words of the dictionary are assigned to the gray values 0, 1, 2…255. 3 Area Coding: Area coding is an enhanced form of run length coding, reflecting the two dimensional character of images. This is a significant advance over the other lossless methods. For Department of Computer Engineering, RMDSSOE, Warje Page 30 B.E. Computer Image Processing coding an image it does not make too much sense to interpret it as a sequential stream, as it is in fact an array of sequences, building up a two dimensional object. The algorithms for area coding try to find rectangular regions with the same characteristics. Theseregions are coded in a descriptive form as an element with two points and a certainstructure. This type of coding can be highly effective but it bears the problem of a nonlinear method, which cannot be implemented in hardware. Conclusion: We have studied different image compression Techniques. Department of Computer Engineering, RMDSSOE, Warje Page 31 B.E. Computer Image Processing Experiment: 6 Assignment No: Title 6 Mini Project - Implement image segmentation to detect object in the background of image. Roll_no Class B.E. Subject Image Processing Sign Department of Computer Engineering, RMDSSOE, Warje Page 32 B.E. Computer Image Processing ASSIGNMENT:6 TITLE: Implement image segmentation to detect object in the background of image. OBJECTIVES: To study and understand Image segmentation to detect object in the background of image. THEORY: Image Segmentation: Let’s understand image segmentation using a simple example. Consider the below image: There’s only one object here – a dog. We can build a straightforward cat-dog classifier model and predict that there’s a dog in the given image. But what if we have both a cat and a dog in a single image? Department of Computer Engineering, RMDSSOE, Warje Page 33 B.E. Computer Image Processing We can train a multi-label classifier, in that instance. Now, there’s another caveat – we won’t know the location of either animal/object in the image. That’s where image localization comes into the picture (no pun intended!). It helps us to identify the location of a single object in the given image. In case we have multiple objects present, we then rely on the concept of object detection (OD). We can predict the location along with the class for each object using OD. Before detecting the objects and even before classifying the image, we need to understand what the image consists of. Enter – Image Segmentation. So how does image segmentation work? We can divide or partition the image into various parts called segments. It’s not a great idea to process the entire image at the same time as there will be regions in the image which do not contain any information. By dividing the image into segments, we can make use of the important segments for processing the image. That, in a nutshell, is how image segmentation works. An image is a collection or set of different pixels. We group together the pixels that have similar attributes using image segmentation. Take a moment to go through the below visual (it’ll give you a practical idea of image segmentation): Department of Computer Engineering, RMDSSOE, Warje Page 34 B.E. Computer Image Processing Object detection builds a bounding box corresponding to each class in the image. But it tells us nothing about the shape of the object. We only get the set of bounding box coordinates. We want to get more information – this is too vague for our purposes. Image segmentation creates a pixel-wise mask for each object in the image. This technique gives us a far more granular understanding of the object(s) in the image. The Different Types of Image Segmentation: We can broadly divide image segmentation techniques into two types. Consider the below images: Department of Computer Engineering, RMDSSOE, Warje Page 35 B.E. Computer Image Processing Can we identify the difference between these two? Both the images are using image segmentation to identify and locate the people present. • In image 1, every pixel belongs to a particular class (either background or person). Also, all the pixels belonging to a particular class are represented by the same color (background as black and person as pink). This is an example of semantic segmentation • Image 2 has also assigned a particular class to each pixel of the image. However, different objects of the same class have different colors (Person 1 as red, Person 2 as green, background as black, etc.). This is an example of instance segmentation If there are 5 people in an image, semantic segmentation will focus on classifying all the people as a single instance. Instance segmentation, on the other hand. Will identify each of these people individually. Conclusion: We have studied image segmentation to detect object in the background of image Department of Computer Engineering, RMDSSOE, Warje Page 36