Performance Evaluations for Super-Resolution Mosaicing on UAS Surveillance Videos

Abstract Unmanned Aircraft Systems (UAS) have been widely applied for reconnaissance and surveillance by exploiting information collected from the digital imaging payload. The super-resolution (SR) mosaicing of low-resolution (LR) UAS surveillance video frames has become a critical requirement for UAS video processing and is important for further effective image understanding. In this paper we develop a novel super-resolution framework, which does not require the construction of sparse matrices. The proposed method implements image operations in the spatial domain and applies an iterated back-projection to construct super-resolution mosaics from the overlapping UAS surveillance video frames. The Steepest Descent method, the Conjugate Gradient method and the Levenberg-Marquardt algorithm are used to numerically solve the nonlinear optimization problem for estimating a super-resolution mosaic. A quantitative performance comparison in terms of computation time and visual quality of the super-resolution mosaics through the three numerical techniques is presented.


Introduction
An Unmanned Aircraft System (UAS) [1] is an aircraft or ground station that can be either remote controlled manually or is capable of flying autonomously under the guidance of pre-programmed GPS waypoint flight plans or more complex on-board intelligent systems. UAS aircrafts have recently found extensive applications in military reconnaissance and surveillance, homeland security, precision agriculture, wildlife conservation, fire monitoring and analysis and other different kinds of aid during disasters. Through surveillance videos captured by a UAS digital imaging payload over the areas of interest, different UAS missions can be conducted. However, the data analysis of UAS videos is frequently limited by motion blurring, resulting from frame-toframe movement induced by aircraft rolling, wind gusting, less than ideal atmospheric conditions, the noise inherent within the image sensors, etc. Therefore, the super-resolution mosaicing of low-resolution UAS surveillance video frames has become a critical requirement for UAS video processing and a pre-step for further effective image understanding.
Given multiple images of a particular scene, multi-frame super-resolution reconstructs a high-resolution image with a resolution above the limits of the camera [2][3][4]. The super-resolved image should have more details than any of the low-resolution images. Mosaicing is the alignment or stitching of two or more images into a single more informative composition representing a 3D scene [5][6]. Generally speaking, the mosaicing creates a panorama, which is impossible to visualize with only one video frame.
Super-resolution mosaicing combines both multi-frame super-resolution and mosaicing and has a number of applications when surveillance video from UAS or satellite is applied. One clear application is the surveillance of certain areas, even during night, with the use of an infrared (IR) imaging system. The UAS can fly over areas of interest and generate super-resolved mosaics that can be analysed at the ground control station. Other important applications involve the supervision of high voltage transmission lines, oil pipes, highway systems, etc. NASA also uses super-resolution mosaics to study the surface of Mars, the Moon and other planets.
Super-resolution mosaicing has been studied by several researchers. Zomet and Peleg [7] applied overlapping areas within a sequence of video frames to create a superresolved mosaic. In their method, the SR reconstruction technique proposed in [8] is applied to a strip rather than a whole image. This means that the resolution of each strip is enhanced by the fusion of all the frames that contain that particular strip. The disadvantage is that this method is computationally expensive. Ready and Taylor [9] introduced a Kalman filter to compute the superresolved mosaic. They added unobserved data to the mosaic using Dellaert's method. Basically, they constructed a matrix for the observed pixels to estimate pixel values. This matrix is constructed using a homography matrix and the point spread function (PSF). The problem is that this matrix is extremely large, so they used a Kalman filter and diagonalization of the covariance matrix to reduce the amount of storage and computation. The drawback of this algorithm is the use of a large matrix and the best results with synthetic data obtain a PSNR of 31.6dB. Simolic and Wiegand [10] developed a method based on image warping. In this method, each pixel from every frame is mapped into the SR mosaic and its grey level value is assigned to the corresponding pixel in the SR mosaic within a range of ±0.2 pixel units. The drawback of this method is that it requires that the motion vectors and homography must be highly accurate, which is very difficult for real surveillance videos from UAS. Wang, Fevig and Schultz [11] used the overlapped area within five consecutive frames from a video sequence. Then sparse matrices were applied to model the relationship between the LR and SR frames, which can be solved using maximum a posteriori estimation. To deal with the ill-posed problem of the super-resolution model, they adopted hybrid regularization. The drawback of this method is that several sparse matrices have to be built for every five frames. Therefore, this method is not appropriate for dealing with a real video sequence, which contains thousands of frames, in real time. Pickering and Ye [12] proposed an interesting model for mosaicing and superresolution of video sequences, where the regularization factor is based on the Laplacian operator. The problem with the Laplacian factor is that it forces spatial smoothness. Therefore, both noise and edge pixels are removed in the regularization process. Arican and Frossard [13] use the Levenberg-Marquardt (LM) algorithm to compute the SR of omnidirectional images. Chung [14] proposed a nonlinear least square solution based on the Gauss-Newton method. The disadvantage of this is that it only works for small images.
Our method combines the ideas of most of these techniques, but it also deals with super-resolution mosaicing in a different manner, which does not require the construction of sparse matrices. Therefore, it is feasible to apply the algorithm to a relatively long image sequence and obtain a video mosaic. In addition, we adopt Huber regularization, which preserves high frequency pixels and then sharp edges are also preserved. Furthermore, we model the super-resolution mosaicing problem in a convex framework [4], which guarantees the convergence of the proposed algorithm.

Observation Model
Assuming that there are K frames of LR images available, the observation model can be represented as: Here, k y (k =1, 2, …, K), x and k η represent the k th LR image, the part of the real world depicted by the superresolution mosaic and the additive noise, respectively. The observation model in (1) x , which represents the reconstruction of the k th warped SR image from the original high-resolution data x . The geometric warping operator and the blurring matrix between x of the real world and the k th LR image frame k y are represented by k W and k B , respectively. The decimation operator is denoted with D . The motion between frames is modelled with planar homography. We compute the homography based on the correspondences of SIFT (Scale Invariant Feature Transform) features [15][16] and Random Sample Consensus (RANSAC) strategy [17]. The robustness of the SIFT feature has been verified in feature matching and object recognition. The estimation of the unknown SR mosaic image is not only based on the observed LR images, but also on many other assumptions such as the additive noise and the blurring process. The additive noise, k η , is considered to be independent and identically distributed (iid) white Gaussian noise. The blurring effect is considered only from the optical equipment. Therefore, the problem of finding the maximum likelihood estimate (MLE) of the SR mosaic image x can be formulated as: where denotes the Euclidean norm. As the SR reconstruction is an ill-posed inverse problem, we need to add another term for regularization, which must contain prior information for the SR mosaicing. This regularization term helps to convert the ill-posed problem into a well-posed solvable problem. Here we adopt the Huber regularization: The Huber function is defined as:

Super-resolution Mosaicing Using Steepest Descent Method
Based on the gradient descent algorithm for minimizing (3), the robust iterative update for x can be expressed as: where G is the gradient operator over the cliques [8,18] and (n)  , the regularization operator can be computed as: Furthermore, the derivative of the Huber function is given as: The gradient operator G has the advantage over the Total Variation (TV). The Huber function and its gradient with respect to (n) x are continuous as well as convex [19]. Therefore, the optimization problem can be solved using the gradient-descent methods such as the steepest descent and the conjugate gradient methods.
The spatial interactions are adopted in our proposed method. The clique structure determines the spatial interactions, where the activity is computed with finite difference approximations to the second-order directional derivatives (vertical, horizontal and two diagonal directions) in each super-resolution mosaic (n) x .

Super-resolution Mosaicing Using Conjugate Gradient Method
The solution of (3) can be estimated using conjugate gradient as: where (n) p is chosen to be conjugate to all previous search directions with respect to Hessian matrix H : The gradient vector, , is described as follows: The gradient operator G is the same as that in the steepest descent method.

Super-resolution Mosaicing Using Levenverg Marquardt Method
Similar to the gradient methods, the Levenberg-Marquardt method [20] can converge from an initial guess, which may be outside of the convergence region of other methods. In order to minimize (3), we define f( )  u x as: where J( ) x is the Jacobian matrix: The Levenberg-Marquardt method is an iterative process. Initiating at the starting point (0) x , the method to solve  x minimizes: where f  ε u -(x) and then  x can be found by solving a linear least squares problem [18,20]. The minimum is attained when J x -ε is orthogonal to the column space of J . This leads to: is called the pseudo-Hessian matrix. Then the Levenberg-Marquardt method is to solve Equation (16) where  x is solved as: After  x is known, we have: Here c is the Levenberg-Marquardt damping term that determines the behaviour of the gradient in each iteration. If c is close to zero, then the algorithm behaves like a Gauss-Newton (GN) method, but if c   , then the algorithm behaves like the steepest descent (SD) algorithm. The values of c during the iterative process are chosen in the following way. At the beginning of the iterations, c is set to a large value, so that the LM method integrates the robustness of SD and the initial guess of the solution to (3) can be chosen with less caution. It is necessary to save the errors for each iteration and carry out the comparison between two consecutive errors. If (k) (k 1) error error   , c is decreased by a certain amount so that LM behaves like the Gauss-Newton method and it speeds up convergence. Otherwise, c is increased to a larger value, the searching area is then extended, which means that LM behaves like SD. The (k) error is defined as:

Experimental results
The experimental tests are based on three sets of data. One is the synthetic data. The other two are the real UAS data, where one is grey-level image data set and the other is the colour image data set. We created synthetic LR frames from a single high-resolution image. These LR frames were first produced using different translations (18 to 95 pixels), rotations (5 o to 10 o ) and scales (1 to 1.5) and then they were blurred with a Gaussian Kernel. The real grey video data were captured by an experimental small UAS operated by Lockheed Martin Corporation flying a custom-built electro-optical (EO) and an uncooled thermal infrared (IR) imager. The time series of images are extracted from the UAS videos with a lowresolution of 60 x 80. The colour image data are collected with a regular camera mounted in a UAS by Cloud Cap Technology company.
We conducted the three proposed algorithms for superresolution mosaicing on both synthetic data and real data and then compared their performance. The mosaicing results constructed from the low-resolution input images are set as the initializations for the proposed algorithms. The comparisons are based on PSNR (Peak Signal to Noise Ratio), running time and iteration error for the synthetic data sets and running time and iteration error for the real data from UAS videos because there is no ground truth data available to compute the PSNR for real data. Figures 1, 2 and 3 show the super-resolution mosaics produced from the three different algorithms on the synthetic test data and two sets of real video data. Tables 1, 2 and 3 list the corresponding quantitative comparisons for outcomes from the three different algorithms. From Figures 1, 2, and 3 and Tables 1, 2, and 3, it can be seen that all the methods improve the resolution of the LR mosaic and all of them improve the colour, details and sharpness. However, when the image is grey (IR images), the Levenberg-Marquardt method produces some artefacts since it solves a linear square equation that is close to being singular (13). The final error for the steepest descent and conjugate gradient algorithms decreases with every iteration, which means that they converge to the optimal solution in every step. However, this error from the Levenberg-Marquardt algorithm can decrease or increase due to the use of the damping factor, c , which accelerates the search for the optimal solution. The Levenberg-Marquardt method, interpolating between the Gauss-Newton method and the Gradient Descent method, avoids the time-consuming computation of the inverse of the pseudo-Hessian matrix in regular singular value decomposition (SVD).
Based on test results on synthetic data and real video data captured from UAS, the Conjugate Gradient method produces the best super-resolution mosaicing results in visual performance. There is almost no difference in visual performance on the super-resolution of the mosaic images between the Levenberg-Marquardt method and the Steepest Descent method. However, the experimental outcomes show that the Steepest Descent method used the least time among the three approaches to reach the convergence and is the most efficient method.   Table 3. Comparison of the three proposed algorithms to compute super-resolution mosaics for real colour video frames captured by UAS.

Conclusions
The three optimization methods: the Steepest Descent method, the Conjugate Gradient method and the Levenberg-Marquardt method, are applied to model the super-resolution of the mosaic images. Their running efficiency and visual performance on synthetic test data and physical test data collected from UAS are compared. Experimentally, the Conjugate Gradient method gives the best super-resolution mosaic results in visual performance while the Steepest Descent method is the most efficient method to converge. There is no large difference in visual performance in the super-resolution mosaicing from the Levenberg-Marquardt method and the Steepest Descent method.