Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Construction of the Training Set Results Super-resolution addresses the problem of estimating a high-resolution version of a low-resolution image. In this research, a learning-based approach for super-resolution is applied to remotely-sensed images. These images contain fewer sharp edges than those previously used to test the method, making the problem more difficult. The approach uses a Markov network to model the relationship between an image/scene training pair (low-resolution/high-resolution pair). Bayesian belief propagation is used to obtain the posterior probability of the scene, given a low-resolution input image. Preliminary results on Landsat-7 images show that the framework works reasonably well when given a sufficiently large training set. The steps involved in the construction of the training set and in the computation of the joint probabilities are the following: 1. Blur and subsample a set of high-resolution images. The set should be large enough so as to include the widest range of intensity structures as possible. 2. Interpolate the images back to the original resolution. 3. Break the images and scenes into center-aligned patches. (a) 4. Apply Principal Components Analysis on the training set. 5. Fit Gaussian mixtures to the joint probabilities P(xk,xj) (neighbouring scene patches k and j) and P(yk,xk) (image/scene pair), as well as the prior probability of a scene patch P(xj). Problem Is it possible to infer a high-resolution version of an image that is first given at low resolution? Stronger approaches than simple interpolation use Bayesian techniques or regularization while assuming prior probabilities or constraints. Furthermore, these techniques are independent of both the structures present in the image and the nature of the image itself. Freeman et al. propose that statistical relationships between pairs of low-resolution and high-resolution images (image/scene pairs) can be learned. Thus, given a new image to super-resolve (of the same nature as the images in the training set), it is possible to obtain a higher-resolution image based on the statistics acquired from the training pairs. The most important features learned by the method are edges. The algorithm learns what a blurry edge should look like in its corresponding high-resolution image. Thus, the training data need to contain many sharp edges. The experiments described in Freeman et al. were done mostly on such training sets. In the present research, the specific problem of super-resolution of remotely-sensed images (with few sharp edges) is addressed. Computation of the Conditional Probabilities 1. Break a new low-resolution (interpolated) image into patches and apply Principal Components Analysis. 2. For each patch yk, find the n closest patches in the training set. corresponding scenes will be used in the calculations. Theory The statistical dependencies between image/scene training pairs (low-resolution and high-resolution images) are modeled in a Markov network where each node describes an image or a scene patch. The topology of the network is shown in the following diagram. Statistical dependencies are shown by lines connecting nodes. Scene patches are depicted in white and image patches in red. y1 y3 y2 y4 x1 x3 (c) (d) Figure 3. (a) Input image (bilinear interpolation) (b) Result of the super-resolution algorithm (c) Original full-resolution image (d) Nearest neighbour interpolation A new image was also blurred, subsampled and interpolated to form the input image. The results of the super-resolution algorithm (Figure 3b) approximates the highresolution image reasonably well (Figure 3c). Edges are better preserved than the interpolated version of the input image (Figure 3a). For comparison, the nearest neighbour interpolation is shown in Figure 3d, where the edges appear more choppy. However, thin structures (such as road networks) are lost in the super-resolution process. This problem might be caused by a small patch-size as well as an incomplete training set. The n 3. The conditional probabilities for each candidate are computed with the following equations: m l P ( x , x j k) l m (3) P ( xk | x j ) m P( x j ) l P ( y , x l k k) P ( y k | xk ) , l P ( xk ) (b) (4) where l and m run from 1 to n and are the candidates for patches xk and xj, respectively. P(yk|xkl) is thus a nx1 vector and P(xkl|xjm) is a nxn matrix. The messages Mjk are nx1 vectors whose elements are set to 1 for the first iteration. 4. The MAP estimate for patch j (xjMAP) is obtained from equations (1) and (2). The advantage of this learning-based algorithm compared to other learning methods is that local support is added in the form of conditional probabilities. In other learning-based approaches, the closest patch in the training set is chosen regardless of its neighbours. This Bayesian technique also has the advantage of learning the prior probabilities instead of assuming them. x2 Discussion Many factors will influence the quality of the results. • This framework works better with training images that contain many sharp edges. This is not the case for the images used in this experiment. In the examples shown, no pre-processing steps were done on the images. Results may be improved by first enhancing the edges in the image. • Because the framework is statistical, the size and completeness of the training set will have a big impact on the result. In the experiment, the size of the training set was small (350 x 350 image) and therefore might not contain the full variety of structures possibly present in a remotely-sensed image. •The fitting of Gaussian mixture models is also very important. Models were obtained with the Expectation Maximization algorithm. The number of iterations, the structure of the covariance and the number of clusters can all modify the computation of the joint probabilities, which are the core of the framework. • The size of the patches, the number of principal components and the number of closest patches considered will also impact the final result. x4 Figure 1: Markov network for the super-resolution problem There are two types of connections. First, each scene patch is connected to its neighbours. Second, a patch in the scene is connected to its corresponding patch in the low-resolution image. Two nodes not directly connected are assumed to be statistically independent. The Markov assumption allows a “message-passing” rule that involves only local computations. The maximum a posteriori estimate (MAP) for node j is calculated using the following equation (Freeman et al.): x j MAP argmax P( x j ) P( y j | x j ) M kj , xj where M k j The goal of this experiment was to test whether a learning-based approach for superresolution can be used on remotely-sensed images. These images contain fewer sharp edges than those previously used to test the method. The training set is a pair of blurred and non-blurred Landsat-7 images (Band #3). The 350x350 image was first blurred with a Gaussian filter, subsampled and interpolated back to the original resolution. (1) k ~l max P( xk | x j ) P( yk | xk ) M k . xk Training Conclusions This research presented a learning-based method for super-resolution applied to remotely-sensed images. A Markov network was used to express the statistical dependencies between a low-resolution/high-resolution training pair. The maximum a posteriori estimate of the scene was obtained by Bayesian propagation. Preliminary results show that such an approach can be used on remotely-sensed images provided that the training set is sufficiently large. Future work will include preprocessing of the images for the enhancement of contained edges as well as the use of a more complete training set. (2) l j ~l M k is the message in the previous iteration. In principle, this rule should only be used on chains or trees. However, the topology of the Markov network for the superresolution problem contains loops. In this situation, it was previously demonstrated that equations (1) and (2) can still be used. It is thus assumed that the network consists of vertical or horizontal chains. Reference Figure 2: Training set (left: sharp image; right: blurred image) Freeman, W.T., Pasztor, E.C. and Carmichael, O.T., "Learning Low-Level Vision", International Journal of Computer Vision, 40(1), pp. 25-47, 2000.