Accurate moving object segmentation in unconstraint videos based on robust seed pixels selection

Due to the clutter background motion, accurate moving object segmentation in unconstrained videos remains a significant open problem, especially for the slow-moving object. This article proposes an accurate moving object segmentation method based on robust seed selection. The seed pixels of the object and background are selected robustly by using the optical flow cues. Firstly, this article detects the moving object’s rough contour according to the local difference in the weighted orientation cues of the optical flow. Then, the detected rough contour is used to guide the object and the background seed pixel selection. The object seed pixels in the previous frame are propagated to the current frame according to the optical flow to improve the robustness of the seed selection. Finally, we adopt the random walker algorithm to segment the moving object accurately according to the selected seed pixels. Experiments on publicly available data sets indicate that the proposed method shows excellent performance in segmenting moving objects accurately in unconstraint videos.


Introduction
Moving object segmentation has been widely used in monitoring systems, unmanned aerial vehicles, autonomous vehicles, and so on. Accurate segmentation of moving objects, consisting of precise position and shape information, is of great significance for subsequent object tracking and recognition. However, due to the motion of the clutter background, existing moving object detection algorithms 1 generally obtained a rough segmentation of the moving object (see Figure 1(c)). The optical flow between adjacent frames was used by most motion segmentation algorithms to segment the moving object from the background. [2][3][4] However, these algorithms will fail in segmenting slow-moving objects, in which case the differences between the optical flow corresponding to the moving object and the background are subtle (see Figure 1(d)). The algorithms 5-7 adopted a longterm analysis of video to tackle the problem (see Figure 1(e)). However, they are not suitable for the online process because of using the frames after the current moment.
The orientation cues of the optical flow are robust to the depth of the scene. This article selected the object and background seed pixels according to the local difference in the optical flow orientation cues. The temporal consistency is used to solve the problem of the slow-moving objects. Firstly, the proposed method detected the moving object's rough contour according to the local difference of orientation cues of the optical flow between adjacent frames. Different from the optical flow cues used in the existing algorithms, [7][8][9] this article adopts the weighted orientation cues. The weight is determined adaptively based on the local magnitude cues of the optical flow. Then, the detected rough contour is used to guide the object and the background seed selection via the point-in-polygon (PIP). 10 To further improve the robustness of the object seed selection, the object's seed pixels in the previous frame are propagated to the current frame according to the optical flow. Lastly, this article used the random walk algorithm 11 to segment the moving object accurately according to the selected seeds. The proposed method only uses the information of the current and previous frames for the accurate moving object segmentation. Thus, the proposed method can achieve online accurate moving object segmentation. We evaluate the proposed method on publicly available data sets and perform comparisons with typical algorithms. Experiment results indicate that the proposed method performs better than existing algorithms in segmenting the moving object in unconstraint videos.
This article's main contributions are twofold: (1) The proposed method uses the weighted local difference in orientation of the optical flow to detect the rough contour of the moving object. The detected rough contour is used to select the object and background seed pixels for accurate segmentation by the random walker algorithm; (2) the object seed pixels in the previous frame are propagated to the current frame for improving the robustness of seed selection.

Related works
Existing moving object segmentation in unconstraint video methods can be classified into two major categories: background-modeling-based methods and motionsegmentation-based methods. The background-modeling-based methods segment the moving object by subtracting the background component from the input frame. The background is often modeled as a codebook, 12 Gaussian mixture model, 13 and so on. Such methods have difficulties in dealing with quick-changing or clutter backgrounds. They are more suitable for constraint monitoring systems, in which the background is fixed. The motion-segmentation-based methods use the differences between the motion of the object and the background to segment the moving object. The motion information is usually obtained by estimating the optical flow of the image sequence. They are widely used for moving object segmentation in unconstraint videos. This article focuses on the latter and summarizes motion-segmentation-based methods as follows.
Moving object segmentation based on sparse optical flow Such methods estimate the motion between adjacent frames based on the sparse optical flow. Estimating the sparse optical flow of feature points between adjacent frames is efficient. However, they can only distinguish the object from the background sparsely. Thus, post-processing is needed for a dense result. The algorithms 3,4 defined the moving objects as groups of pixels that are salient in motion and color, which were segmented directly through clustering by mean-shift. The center of clustering for the tracking pixels was computed by the feature of position, color, and optical flow calculated by Lucas and Kanade. 14 Then, the markov random fields (MRF) was used to segment the moving object densely based on the clustering centers. Kim et al. 15 found that the background feature points are scattered more widely than that of the moving object, so the scatteredness of clustered optical flow vectors was used to classify the object and background. Nonaka et al. 16 used spatial closeness, amplitude, and direction similarity to cluster the sparse optical flow. The cluster results were classified into different labels according to their shape and size. Finally, the graph cut was used to obtain a dense segmentation based on the sparse label. The algorithms proposed by Malik and Brox, 5 Ochs et al., 6 and Sheikh et al. 17 used the long trajectories of the feature points to segment moving objects to enhance the robustness of the segmentation algorithm. The basis vectors of trajectories were extracted based on the rank constraint of the background motion trajectory matrix in the study of Sheikh et al. 17 Then, the trajectories belonging to the moving object were segmented by the difference with the trajectories constructed by basis vector. In the studies of Malik and Brox 5 and Ochs et al., 6 the spectral clustering was used to segment the moving object based on the affinities between long trajectories. Then, the information in the spatial-temporal domain was used to obtain dense segmented object based on the sparse segmented trajectories.

Moving object segmentation based on dense optical flow
Motion segmentation based on sparse optical flow cannot obtain the complete object directly. Thus, some algorithms 2,7-9,18 turned to dense optical flow for segmenting moving objects. Huang et al. 2 modeled the background regions in the optical flow field. They computed the homograph matrix based on the dense optical flow estimated by Ilg et al., 19 which was used to construct a new optical flow of background. Then, the moving objects were detected by comparing the original optical flow with the constructed optical flow. A context-aware motion descriptor was designed by Chen and Lu 18 based on the histogram of oriented optical flow to measure the inconsistency between the optical flow of the object and its surrounding background. Sajid et al. 20 used a second-order function to approximate the background motion based on its lowrank characteristics. Then, they estimated the foreground probability by comparing the motion field with background motion approximation. Narayana et al. 8 considered that the magnitude of the optic flow is depth-dependent, that the objects at different depths from the camera can exhibit different optical flow although they share the same real-world motion. Thus, they segmented moving objects in the orientation field of optical flow. The orientation fields of background were modeled by discrete sampling on different camera movements, which cannot cover all the motion of the camera. Bideau and Learned-Miller 9 also used the orientation of optical flow to segment moving objects, where the optical flow caused by camera rotation was subtracted by a modified algorithm. 21 Then, the Bayesian probability model was used to measure the information on how objects are moving differently from the background, where the segmented result in the previous frame served as the prior, and the difference of the orientation fields served as the likelihood. Papazoglou and Ferrari 7 used the optical flows of long sequence to solve the problem of object motion slowing down. The difference in optical flow in the spatial domain was used to detect the coarse moving object. Then, the appearance model and MRF were used to refine the coarse result based on the information in the temporal and spatial domain. Wu et al. 22 utilized the dense particle trajectories to segment moving objects coarsely by reduced singular value decomposition. They segmented fine objects according to the reconstructed background motion, which is obtained through image inpainting based on the coarse segmentation. Zhu and Elgammal 23 formulated the problem as a multi-label segmentation problem by modeling moving objects and background in different layers. They assigned an independent processing layer to each moving object and background and estimated both motion and appearance models. Koh and Kim 24 augmented the initial object regions, which are generated by using both color and motion edges, with missing parts and reduced them by excluding noisy parts.
A summary and comparison of these two types of moving object segmentation algorithms are shown in Table 1. Table 1. Comparison of some moving object detection/segmentation methods.

Algorithm
Classification model Dense segmentation /post-processing Sparse-optical-flow-based methods Bugeau and Pérez 3,4 Mean-shift Graph cut Malik and Brox 5 and Ochs et al. 6 Long trajectory þ spectral clustering Potts model Sheikh et al. 17 Long trajectory þ rank constraint Graph cut Nonaka et al. 16 Cluster growing algorithm Graph cut Dense-optical-flow-based methods Huang et al. 2 Homograph approximation Dual-mode judge mechanism þ adaptive thresholding Papazoglou and Ferrari 7 Motion boundaries þ inside-outside maps Spatial-temporal smoothness þ graph cut Narayana et al. 8 Orientation field þ color þ prior Gradient descent Bideau and Learned-Miller 9 Prior þ translation angle likelihood Maximum posterior Chen and Lu 18 Orientation histogram Support vector machines Sajid et al. 20 Second-order approximation þ appearance module Graph cut Wu et al. 22 Dense particle trajectories þ reduced singular value decomposition þ background motion reconstruction

Adaptive thresholding
The sparse-optical-flow-based methods firstly adopt a classification model to label the sparse points based on their optical flow and other characteristics. Then, they usually turn to the graph model for a dense segmentation based on the sparse seed points. The dense-optical-flow-based methods can distinguish all pixels directly based on different classification models according to the pixel's features. Usually, the prior information is introduced to eliminate noise in the final segmentation, such as the temporalspatial domain continuity. The background optical flow is caused by the motion of camera platform, while object optical flow is caused by the motion itself plus the camera. There may be a large difference in the optical flow in different background regions. However, it has the property of local smoothness. In other words, the difference in the optical flow is small for two adjacent pixels in the background region for most camera motion types. When a moving object appears in the background, there will be a difference between the optical flow of objects and that of its surrounding background to some extent. So that the local difference of orientation and magnitude field in optical flow, calculated by algorithm, 16 is used to detect the rough contours of the moving object in this article, which is used to select the seed pixels of the object and background. Then, we use the random walker algorithm 11 to segment the moving object based on the spatial distribution continuity of the object and background. To further improve the robustness of the proposed algorithm, the temporal domain movement is utilized to propagate the seed pixels of the object in the previous frame to the current frame.

Metrology
This article segments the moving object accurately in the unconstraint videos based on robust seed pixels selection.
The flowchart of the proposed method is shown in Figure 2. There are three main steps in our method: the rough contour detection using the weighted local orientation cues of the optical flow, the robust seed pixels selection, and the moving object segmentation.

Rough contours detection using the weighted local orientation difference of optical flow
The distribution of the optical flow has an apparent discontinuity next to the contour of the moving object. Thus, the contour of the moving object can be detected by calculating the local difference of the optical flow. The background motion projected on the image depends on the distance to the camera, that is, the depth of the scene. The optical flow of background regions may be different, although they share the same real-world motion. Thus, the optical flow vector is not a robust cue for moving object segmentation, especially for the absence of clutter backgrounds. The orientation of the optical flow is independent of the depth of the scene 8 in the case of camera translation. The change of orientation field in the background region is smooth in the spatial domain. To avoid the error detection caused by the clutter background, this article calculates the local difference in orientation and magnitude field of the optical flow, respectively. The local orientation difference OD i of the optical flows for the pixel p i is defined as equation (1). This article calculates the cosine of orientation difference, and the minimum cosine value obtains the map of local orientation difference where u i ; v i ð Þ is the optical flow vector of the pixel p i , ff denotes the angle between the two vectors, N i is the neighborhoods of pixel p i and set to 3 Â 3 in this study.
The local difference between the optical flow orientation cues may be small when the motion direction of the object and the background are similar. The orientation difference of the pixels near the optic axis will be dramatically significant when the camera moves along the optic axis, which often happens in autonomous driving. There may also be some changes in the orientation field within the object and the background region because of the inaccurate estimated optical flow. Therefore, this article uses the magnitude of optical flow to weight the orientation in detecting the contours of the moving object. The maximum of the magnitude difference between adjacent pixels is used as the weight as in equation (2) MD As shown in equation (3), the weighted orientation difference (WOD) is calculated by multiplying OD with the normalized magnitude difference d MD The maps of local differences in optical flow obtained by different ways are shown in Figure 3. We can see that there are some wrong boundaries inside the object region in the map of the local difference of orientation because of the inaccurate optical flow, such as Figure 3(b). There are also some wrong boundaries occurring in background regions in the map obtained by the magnitude difference. By weighting the orientation difference with that of amplitude, we can get a cleaner local difference map, as shown in Figure 3(d).
The rough contour of the moving object is detected from the WOD map via a two-step thresholding method. Firstly, we detect the definite contours through thresholding on WOD with a larger value T c1 . Then, a smaller threshold T c2 is used for the pixels adjacent to the definite contours to get much more complete contours. The value of the threshold is determined by the mean m and standard deviation s of WOD, as in equation (4) Robust seed pixels selection Based on the rough contours detected in the "Rough contours detection using the weighted local orientation difference of optical flow" section, this article selects the seed pixels of the object and the background using the PIP. 10 The pixels belonging to the moving object should be located inside the contours, while the pixels of the background should be located outside of the contours. The algorithm PIP 10 determines whether a point is inside or outside the contours simply. As shown in Figure 4(a), the PIP 10 draws a ray from the current point to infinity. The point locates inside the contours if the number of intersections between the ray and contours is odd (as the red points in Figure 4(a)). If the number of intersections is even, the point locates outside the contours (as the blue point in Figure 4(a)).
To improve the robustness of the seed selection to the incomplete rough contour, this study emits rays from the current pixel to multiple directions, as shown in Figure 4(b). We count the number of rays with an odd number of intersections for each pixel. The number for the pixel p i is donated as In i . Then, this study selects the seed pixels of the object and background according to the number In i . As shown in Figure 4(b), this study emits rays to  eight directions. The probability of the red point inside the contours is 5 8 in Figure 3(b), and it is 2 8 for the blue point. Let R represents the areas inside the bounding box of rough contours. The definite background pixels cannot locate inside the bounding box of the detected contours, and the definite object pixels cannot locate outside the bounding box. So, this study selects the seed pixels of background and object as where threshold T 1 is a larger constant to ensure the pixels are the definite object, and T 2 must be small enough for particular background pixels. They are set 7 and 1 in this study, respectively. The tuple S i corresponds to the pixel p i consisting of the probabilities that p i belongs to the background and the moving object. The slowly moving object brings excellent difficulty for us to get seed pixels robustly by the incomplete rough contour. This study propagates the object seed pixels in the previous frame to the current frame through the optical flow to tackle this problem. The object seed pixel p i;tÀ1 in the previous frame is propagated to the current frame p j;t by the optical flow vector u i;tÀ1 ; v i;tÀ1 À Á . The corresponding object seed pixel is obtained by equation (6) x j;t ¼ x i;tÀ1 þ u i;tÀ1 Further, p j;t is validated as in equation (7). The p j;t is an object seed pixel only if it locates inside the bounding box of the object contours and In j;t is larger than a threshold T 3 where threshold T 3 is smaller than T 1 because of the temporal constraint and is set to 4 in this study.

The moving object segmentation
In this section, this study segments the moving object accurately through the random walker algorithm 11 based on the seed pixels selected in the "Robust seed pixels selection" section. Based on the seed pixels, the random walker algorithm segments the pixels according to the continuity of spatial intensity distribution. The random walker 11 is defined on a graph model G ¼ V ; E ð Þ consisting of nodes v 2 V and edges e 2 E V Â V . The edge e ij connects two adjacent nodes V i and V j is assigned a weight w e ij À Á for measuring the similarity of V i and V j . The graph is defined 4-links and undirected in this study. We define the weight as where R i , G i , and B i are the red, green, and blue intensity of the pixel P i . l is a positive coefficient and is set to 90. 11 The degree of v i is defined as the sum of weight for all edges e ij connected to v i The element l ij of the combinatorial Laplacian matrix L can be calculated as follows The L is a symmetric matrix. We rearrange it and let the elements correspond to the seed pixels located in the top left part L M . The unknown pixels located in the bottom The probability of the unknown pixels belonging to the moving object and backgrounds can be calculated by solving the linear equation (12) where S M is the label tuple of the seed pixels, which can be obtained by equations (4) and (6). S U is the probability vector for the unknown pixels. The mask M of the moving object can be obtained by comparing of the two elements in S i as in equation (13) Algorithm 1 summarizes the overall procedure of the proposed moving object segmentation method.

Experiment settings
To evaluate the performance of the proposed method, this study conducts experiments on eight standard benchmark sequences from the Moseg_dataset. 5 Besides, we compared our method with three existing typical methods. 1,2,7 The results of the algorithm 1,2,7 are all produced by the codes downloaded from the authors' home page. The optical flow used in previous methods 2,7 and the proposed method is calculated by using the same algorithm. 25 The parameters in the compared algorithms are set as the authors recommend. Moreover, all the segmentation results of the proposed method are obtained with the same parameters.

Experiment results and analysis
The sample experiment results are shown in Figure 5. The results show that the proposed method can obtain a more accurate moving object than the three compared methods. MCD 1 tends to obtain incomplete objects because only a single Gaussian model is used to model the background, which is not enough and does not make use of the spatial information. At the same time, the homograph transformation cannot accurately compensate for background motion in some scenes. The algorithm of Huang et al. 2 cannot deal with the slow-motion challenges because only the optical flow is used to segment moving objects, and the homograph transformation is used to denote the background movement. The contour of the object cannot be extracted accurately in Papazoglou and Ferrari's method. 7 To compare the robustness of the four algorithms for the slow-moving object, the segmentation results of three consecutive frames in two sequences are shown in Figure 6. The algorithm of Huang et al. 2 only uses the optical flow information of the current frame. When the object's speed slows down, the algorithm cannot accurately detect or even lose the object. The algorithm of Papazoglou and Ferrari 7 makes use of the information in the temporal domain of the image sequence. However, the segmentation result is greatly affected by the motion intensity of the object, so that the objects may not be segmented in some consecutive frames. The proposed method propagates the object's seed pixels in the previous frame to the current frame according to the optical flow, which can segment the object effectively even when the motion of the object becomes weak.
The overlap area of the detected object and ground truth is used to evaluate the precision of our algorithm quantificationally. Given the segmented mask of object M t and ground truth M g , the overlap score S is defined as where \ and [ are the intersection and union operators, respectively, and Á j j is the number of pixels in the mask. The average overlap scores of each sequence by compared and proposed algorithms are shown in Table 2, from which we can find that the proposed method outperforms the other three methods in most sequences. It illustrates that our algorithm can segment the moving object completely and has less false alarm due to utilizing the continuity of object and background distribution in spatial domain based on the random walker.
To evaluate the performance of the algorithm more precisely, this study uses another two metrics (precision rate and recall rate) to measure the proposed method. These two metrics are used to evaluate the missing and false alarm of the methods, respectively. In this study, the precision P and recall rate R are defined as equation (15). A larger P means the complete object segmentation and less missing. The large R, the less false positive in the segmentation result indicates the algorithm has a lower false alarm The average P, R, and S of all the sequences by compared and proposed algorithms are calculated and shown as the bar graph in Figure 7. As shown in Table 2, the proposed method performs better than the compared algorithms 1,2,7 in terms of precision and recall rate. Compared with the algorithm of Huang et al., 2 which has the best performance among the compared methods, the proposed algorithm improves by 8.0%, 4.37%, and 7.28% in precision, recall rate, and overlap score, respectively.

The computation efficiency of the proposed method
We measure the computation time to evaluate the efficiency of the proposed and the compared Algorithm 1. Accurate moving object segmentation based on robust seed selection. methods. 1,2,7 Table 3 shows the average computation time per frame measured by an Intel Core i5-6200U, 2.4 GHz PC with a resolution of 640 Â 480 of the four algorithms, where the one by Yi et al. 1 is implemented by Cþþ and other three by MATLAB R2018a. The time spent on calculating optical flow is excluded in the methods of Huang et al. 2 and Papazoglou and Ferrari 7 and the proposed method.
The algorithm of Huang et al. 2 takes the least amount of time because the core is to calculate the homograph on the downsampled optical flow. Moreover, the low computational efficiency of the algorithm by Papazoglou and Ferrari 7 is that it needs to calculate the superpixels of each frame and graph cut is used in each frame for segmentation. To improve the efficiency of the proposed method, this study extracts the region of interest (ROI) from the original image for segmenting the object, where the ROI must include the bounding box of the rough contours, and the pixels on its boundaries are all set to seed pixels of background. Thus, the computation time of the moving object segmented process has little correlation with the resolution of the image in the proposed method.

Discussion
This study adopts a coarse-to-fine strategy to segment moving objects under a moving camera. The proposed method utilizes the local difference of the optical flow field to detect the rough object contours, which is used to select the seed pixels of the object and background. Then, this study processes the object extraction locally by the random walker, which can segment the moving object efficiently and accurately. The moving object segmentation algorithm should make full use of the information in the spatial and  temporal domain. However, a strong constraint will lead to missing detection. In the temporal domain, the proposed algorithm propagates the object seed pixels to the next frame to improve the robustness. Because of using a broad range of frames for the object continuity in the temporal domain, the algorithm of Papazoglou and Ferrari 7 may not segment the moving object for a long time due to the missing detection in several frames. This study just uses the seed pixels in the last frame as the supplementary information. Moreover, the proposed method can serve as the moving object detection method be it only uses the information before the current frame.

Conclusion
This study proposes an accurate moving object segment method in unconstrained video based on robust seed selection. The seed pixels of the object and background are selected robustly using the optical flow cues. The proposed method uses the weighted local difference in orientation of the optical flow to detect the moving object's rough contours. Furthermore, the detected rough contours are used to guide the object and background seed pixels selection. To further improve the robustness for the slow-moving object, the object seed pixels in the previous frame are propagated to the current frame. Experiments conducted on the publicly available data sets indicate that the proposed method shows excellent performance in accurate moving object segmentation, especially for the slow-moving object. We will try to select the seed pixels of the object and background based on the sparse optical flow for higher calculation efficiency for future work. Besides, the algorithm will be optimized and transplanted to the Cþþ program to be accelerated by CUDA to improve the computational efficiency.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.