Training-Based Super Resolution Enhancement using CUDA Yang Yi-Ming Chang Shu-How {R99944035,D99922013,R99944018}@ntu.edu.tw Abstract Learning-based super-resolution can reconstruct the high resolution image results in high quality. However, building a training-based super resolution system has a huge computing complexity in times and spaces. In this paper, we propose a GPU-based super resolution system using Super Resolution Through Neighbor Embedding (SRNE) and GPU-Brute Force (BF)KNN search method to reduce the processing time. SRNE is that the local geometries of image patched are similar in two distinct HR and LR feature spaces. We use BF KNN method to search training database the similar image patch and reconstruct the HR training patches. Experiments show that our results can improve more than 10x speed up computation times. 1. Introduction Super resolution is the problem of generating a high resolution image from one or more low-resolution images. One of the most popular methods is “Super Resolution through Neighbor Embedding”(SRNE)[1]. This paper is on generating a high-resolution image (HR) from a single low-resolution image (LR) with a set of many training data images from scenes or different type. Chang et al. proposed method that is generated a high resolution image patch does not depend on only one of the nearest neighbors in the training set. In SRNE, there are some computational problems. For example, it is very challenging to build result image in a few minutes time, learning-based can‘t process online. In this paper, we propose a training-based super resolution in GPU parallel computing. Because SRNE has high parallel in feature extraction, build training data, KNN search method, grain matrix and reconstruct HR image. In KNN search, we apply fast KNN search in GPU[2], this method propose a CUDA implementation of the “brute force” KNN search to solving this problem. We consider SRNE in GPU computing that will have some performance improvement. And build a large training patch dataset will reconstruct high quality result images. The rest of the paper is organized as follows. In Sect. 2 we discuss related works in super resolution and GPU computing. And Sect. 3 we describe our SRNE with GPU in details. We report the experiment results and compare performance evaluation between CPU version and GPU Chang Man-Chia version SRNE in Sect 4. Finally, discuss and conclusion in Sect. 5. 2. Related work Resolution enhancement using smoothing or interpolation method is widely used in image processing. Smoothing is to use some filters such as Gaussian Filter and Median Filter. Interpolation method is usually achieved by using bilinear interpolation or bicubic interpolation, which are easy to implement. However, both methods may cause some artifacts. Recently, many methods have been proposed. Some methods make use of training images [5, 6] or require strong priors [7]. Each of these methods in high resolution image comes from only one nearest neighbor. Learning-based super resolution can recover the high resolution image in high quality. [2] use random projection tree with manifold learning and use GPU-base knn search to accelerate the procedure. In the paper we implement, it proposes a method use multiple training images to generate each image patch in high resolution image. We find k most similar patches to composite final high resolution patch given a specific weighting vector. This property makes generalization over the training samples is possible and hence fewer training samples are required. Most important of all, the method can be highly-parallelized using GPU. 3. System Architecture 3.1 System overview The system architecture overview is show in Fig. 1. There are two parts in this architecture, LLE processing in left-side and training processing in right-side. Fig.1 system flowchart 3. In system flowchart detail, we show in Fig. 2, each state has different color representation. Red state denote compute in GPU, light blue state is the addition function in SRNE method, brown means training data processing. 4. Compute the reconstruction weights of the neighbors that minimize the reconstruction error. Compute the high-resolution embedding patch q Yt using the appropriate high-resolution featuresof the Knearest neighbors and thereconstruction weights. 3.3 K-NN search using CUDA K nearest neighbor (KNN) is a method for classifying object based on closing training examples in feature space. In our method, we use Brute Force KNN search[2] to find k nearest LR patches in training data for each patch in query image, fig. 3 is an example of KNN search problem solution. Fig. 3 Example graph of the kNN search problem in k =3 Red cross is query point, blue points are reference points Fig. 2 system flowchart in detail At the pre-processing stage, we apply color transform and calculate patch complexity, a simple check mechanism – patch complexity measurement, such as patch mean and variance. Because we believe that some patches are too smooth or no evidently different than neighbor patches if those run into LLE process will be increased more computing. In GPU computation, there widely classify three categories, global memory, shared memory and register. Global memory can store large data as video card support, usually 128MB to 1GB in general market. Shared memory is opposite, it’s local memory and only at the same SM can use. So we need to reduce the memory usage and more efficiency. 3.2 LLE processing First, given some variable definition, an input of low-resolution imageX t , we estimate the high resolution image Yt from training data set of one or more low-resolution image Xs and the corresponding high-resolution images Ys . As in LLE, it can be summarized as four parts: 1. Compute the input low-resolution patch feature vector. 2. Find the K nearest neighbor in KNN search method. Letπ = {π1 , π2 , π3 , … , ππ }be the set of training patches feature vectors,π = {π1 , π2 , π3 , … , ππ } be the set of query image patch feature vector. Our task is to search the k nearest neighbors of each query patch given a specific distance. In this project we use 2-norm distance as metrix. In our implementation,we use brute (BF) search to find KNN. The BF algorithm is the following: 1.Compute all the distance between ππ andππ , j ∈ [1, m]. 2.Sort the computed distance. 3.Choose the k training patches feature vector corresponding to k smallest distance 4.Repeat step 1. to 3. For all query image patch The two main parts BF method of KNN are computation of distance and sorting. Each of these two parts is highly-parallelizable. This property makes the BF method perfectly suitable for CUDA implementation. The computation of distance can be fully parallelized since the distance between pairs of points are independent. The sorting part can also be implemented in CUDA. Each thread sorts all distances computed for a given query patch. 3.4 Neighbor embedding method For each patch, in the low-resolution imageXt , first compute the reconstruction weights of its neighbors in Xs by minimizing the local reconstruction error. The HR embedding is then estimated from the training image pairs by preserving local geometry. At last, we overlap training HR patches in the result image. The neighbor embedding algorithm as follows: q 1. For each patch Xt , in image Xt : A. Find the set Nq of K nearest neighbors in Xs , using [2]. B. Compute the reconstruction weights of the neighbors that minimize the error of q reconstructing Xt q C. Compute the high-resolution embedding Yt using the appropriate high-resolution features of the K nearest neighbors and the reconstruction weights, example in fig.4. 2. Construct the target high-resolution image Yt by enforcing local compatibility and smoothness constraints between adjacent patches obtained instep 1C. Fig. 4 Neighbor embedding procedure applied to a low-resolution patch for3X magnification 4 Fig 5.(a) build Fig. 5(b) bilinear 2X Fig. 5(c) SRNE 2X 4.3 Performance evaluation In this section, we analyze the CPU and GPU processing time in SRNE. The scaling factors are all 2 and the database contains 46370 patches. Table 1 show the performance between CPU and GPU version SRNE. There are four test image dataset in this evaluation, original size from 50x50 to 171x120. The speed-up detail which show in Table 1. Image house testing01 paint butterfly house01 Size 50x50 124x106 125x137 156x100 171x120 Table 1 CPU(s) 35.42 162.99 281.95 176.11 364.96 GPU(s) 1.93 7.55 9.67 8.87 11.42 Speed up 18.37 21.60 29.15 19.85 31.97 Results 4.1 Setup The computer use for show results and evaluation performance is an Intel Duo CPU3 GHz, with 4GB of RAM memory. The graphic card we used is a NVIDIA GeForce 9600 GT with 2G memory and 64 multiprocessors. Fig. 6(a) house Fig. 6(b) Result(2x) Fig. 7(a) testing01 Fig. 7(b) Result(2x) 4.2 Bilinear vs. SRNE with GPU In results, we show some results between bilinear and our SRNE method in GPU. Second, we give the processing time between CPU and GPU version. Fig. 5 shows the bilinear interpolation and our results. We apply neighbor embedding to a small 3x3 patch from a low-resolution house image in fig. 5(a). Other results show in fig.6 to fig. 10. 5 Fig. 8(a) paint Fig. 8(b)Result(2x) Fig. 9(a) butterfly Fig. 9(b) Result(2x) Fig. 10(a) house01 Fig. 10(b) Result(2x) Conclusion In this paper, we propose a GPU-based super resolution system using Super Resolution Through Neighbor Embedding (SRNE) and GPU-Brute Force (BF) KNN search. We show that using CUDA to do super resolution can accelerate up to a factor between 20 and 30 compared to CPU-base implementation. This improvement allows us to get higher quality high resolution image with more training images within short time. Besides, we speed up whole procedure with pre-processing the image by checking patch complexity. Image patches with lower complexity represent its smoothness. We skip the CUDA process and just do bilinear interpolation for these smooth patches. Future work is that we want to accelerate the linear system solving part in CUDA since we just implement in CPU currently. References [1] H. Chang, D.Y Yeung, Y. Xiong, “Super-Resolution Through Neighbor Embedding”, CVPR 2004, pp. 275 – 282, 2004 [2] J. Pu, J. Zhang, P. Guo and X. Yuan, “Interactive Super-resolution through Neighbor Embedding,” ACCV 2009, pp. 496 – 505, 2009 [3] V. Garcia and E. Debreuve and M. Barlaud. “Fast K Nearest Neighbor Search using GPU”, CVPR Workshop on Computer Vision on GPU, June 2008 [4] V. Garcia. Ph.D. Thesis: Suivi d'objets d'intérêt dans une séquence d'images : des points saillants aux mesures statistiquesUniversité de Nice - Sophia Antipolis, Sophia Antipolis, France, December 2008 [5] W.T. Freeman, T.R. Jones, and E.C. Pasztor. “Example-based Super-Resolution”, Computer Graphics and Applications, 22(2):56–65, 2002. [6] W.T. Freeman and E.C. Pasztor, “Learning Low-level vision”, ICCV 1999, volume 2, pages 1182–1189, September 1999. [7] R.R. Schultz and R.L. Stevenson, ”A Bayesian Approach to Image Expansion for Improved Deο¬nition”, IEEE Transactions on Image Processing, 3(3):233–242, 1994.