Accurate hierarchical stereo matching based on 3D plane labeling of superpixel for stereo images from rovers

An accurate hierarchical stereo matching method is proposed based on continuous 3D plane labeling of superpixel for rover’s stereo images. This method can infer the 3D plane label of each pixel combined with the slanted-patch matching strategy and coarse-to-fine constraints, which is especially suitable for large-scale scene matching with low-texture or textureless regions. At every level, the stereo matching method based on superpixel segmentation makes the iteration convergence faster and avoids huge redundant computations. In the coarse-to-fine matching scheme, we propose disparity constraint and 3D normal vector constraint between adjacent levels through which the disparity map and 3D normal vector map at a coarser level are used to restrict the search range of disparity and normal vector at a fine level. The experimental results with the Chang’e-3 rover dataset and the KITTI dataset show that the proposed stereo matching method is efficiently and accurately compared with the state-of-the-art 3D labeling algorithm, especially in low-texture or textureless regions. The computational efficiency of this method is about five to six times faster than the state-of-the-art 3D labeling method, and the accuracy is better.


Introduction
The 3D information is the main way of robot 3D visual perception. 1,2 Stereo matching is an important step of disparity calculation or 3D reconstruction in binocular vision, 3 which has been widely studied. However, there are still challenges in occlusion and weak texture or textureless regions, especially for the rover's stereo images.
In recent years, the matching method based on 3D labels has increased the accuracy of stereo matching. It can not only estimate the disparity of each pixel but also estimate the 3D normal vector of each pixel. To efficiently infer 3D labels, many methods successfully use PatchMatch, [4][5][6] which can estimate a 3D plane at each pixel. 7 In recent years, inspired by randomized search and propagation of PatchMatch, 6,8 many optimization methods with belief propagation (BP) 4 or graph cut (GC) 9 have been proposed for efficient inference in pairwise Markov random field (MRF) with large label spaces. To the best of our knowledge, the local expansion moves (LocalExp) method 10 is the state-ofthe-art method using local expansion moves based on GC.
The first problem is that in the LocalExp method, three grid structures with cell sizes of 5 Â 5, 15 Â 15, and 25 Â 25 pixels are used, which may lead to huge redundant computations. However, the stereo matching method based on superpixel assumes that pixels in the same superpixels belong to the same 3D surface. Therefore, we propose superpixel-based segmentation, which has low complexity and makes the iteration convergence faster. The second problem is that the Loca-lExp method cannot handle low texture regions well. As shown in Figure 1(a), the typical characteristic of this image is that it contains low-texture or even textureless regions. The disparity map generated with the LocalExp method (https://github.com/t-taniai/LocalExpStereo) is shown in Figure 1(b), however, there are lots of mismatches and unreliable matching points in the bottom and left side of the disparity map. The third problem is that it is very difficult to assign a suitable 3D label for each pixel from the infinite continuous label space. Therefore, the Loca-lExp method 10 iteratively applies the local expansion moves using GC to update and propagate local disparity planes, which not only increases the calculation but also may get some wrong matching results. In this article, based on the original LocalExp method, we introduce a coarse-to-fine stereo matching method using 3D plane labeling of superpixels. The coarsest level improves the matching robustness especially in the low texture regions, while the other levels guarantee the robustness because they make full use of disparity constraint and normal vector constraint between two adjacent levels.
Overall, the major contributions of this work are as follows: (1) A coarse-to-fine stereo matching framework combined with 3D labels is proposed, which is especially suitable for the stereo matching of low-texture or textureless regions in large-scale real scenes. (2) A matching method based on superpixel segmentation is proposed, which makes the iteration convergence faster and avoids huge redundant computations. (3) We propose disparity constraint and normal vector constraint between two adjacent levels, through which the disparity map and 3D normal vector map at a coarser level are used to restrict the search range of disparity and normal vector at a finer level.
The remainder of the article is arranged as follows: related works are present in the second section; the proposed method is given in the third section; the experimental results and conclusion are given in the fourth and fifth sections, respectively.

Related works
Coarse-to-fine matching A few decades ago, the coarse-to-fine stereo matching strategy has been introduced into many matching methods. 11 This kind of method first calculates a coarse resolution disparity and then uses coarse disparity to constrain the disparity search range for calculating fine disparity. 12 To speed up the convergence, this strategy has been widely used in global matching methods. A hierarchical stereo matching method using dynamic programming is proposed by Van Meerbergen et al., 13 which achieves a tremendous gain in memory and speed. A hierarchical strategy using semiglobal matching is introduced to generate a half-resolution disparity map firstly, and then, it is used for the disparity computation of the original image. 14 A coarseto-fine stereo matching based on the more-global matching (MGM) method is proposed by Li et al., 15 which improves the disparity accuracy compared with the original MGM. 16 Some BP algorithms 17,18 are proposed to use a coarse-tofine strategy for reducing the complexity. A coarse-to-fine bilateral disparity structure method based on GC 19 is introduced to reduce the computational cost and improve the disparity accuracy. In addition to the global methods, the local methods also adopt the coarse-to-fine strategy, which mainly reduces the disparity search range for the computation of finer disparity. A stereo matching method with multiscale and multiwindow is proposed to estimate disparity for restricting the disparity search range. 20 A confidence-based multiscale stereo matching strategy has been proposed, 21 which can obtain higherresolution disparity maps by processing the existing lower-resolution ones.

3D label-based stereo matching
The matching cost calculation based on 3D labels is different from that based on discrete disparities stereo matching. For each pixel in one image, lots of corresponding regions can be obtained by the slanted-plane of the pixel according to different 3D labels. Therefore, the major issue with 3D labels stereo matching is the computational complexity.
An energy function with hybrid MRF is proposed by Yamaguchi et al., 22 which contains continuous variables for 3D slanted plane and discrete random variables for each pair of neighboring segments. A slanted plane model is optimized with simple linear iterative clustering (SLIC) segmentation based on a pre-estimated disparity map, 23 and simultaneously, the states of occlusion and coplanarity over adjacent segment pairs are also analyzed. Bleyer et al. 7 propose to segment the scene into planar superpixels and estimate each pixel's optimal 3D plane during all the possible slanted planes. Olsson et al. 24 represent second-order surface smoothness with 3D labels. Heise et al. 25 propose to integrate PatchMatch method into a variational smoothing formulation. Li et al. 26 use pixelwise 3D label optimization by fusing multilayer superpixels iteratively. Taniai et al. 9,10 propose to use LocalExp based on GC.

Methodology
The proposed matching method constructs a hierarchical architecture and works in a coarse-to-fine scheme. The advantage of this method is that some mismatches can be solved at a coarse level, especially in the low-texture or textureless regions. Given robust disparity map at the coarser level, disparity reconstruction at the finer level can be correctly obtained based on disparity constraint and normal vector constraint.
The input two images are firstly corrected by epipolar rectification, 27 and then, they are decomposed into a L-level pyramid. As can be seen from Figure 2, the matching process of the two images at the coarsest level L mainly consists of three steps: superpixel segmentation, iterative optimization based on superpixel, and postprocessing. From level l ¼ L À 1 to level l ¼ 1 (the finest level), the disparity map at the coarser will guide the disparity search range at the finer level. For each level, the disparity map can be computed by the following four steps: coarse-to-fine disparity constraint and 3D label constraint, superpixel segmentation, iterative optimization based on superpixel, and postprocessing. Here, postprocessing is a left-to-right consistency check. 15

Superpixel-based image segmentation
The stereo images are firstly decomposed into many nonoverlapping segments with SLIC algorithm. 28 Figure 3(a) shows an image with about 200 SLIC superpixels. And then, as shown in Figure 3(b), we define three windows for each superpixel S i . The unit window U i represents the extended window of the Minimum Enclosing Rectangle (MER) of S i . The optimization window O i is an extended window, which extends r-pixel width around the MER of neighborhood superpixels of S i . The affine transformation window A i is a rectangle window, which extends r-pixel width around O i .

Iterative randomized optimization based on superpixel
According to the three defined windows, U i , O i , and A i , for each S i , the stereo matching based on superpixel is described as follows. Let I be a set of 3D labels, which correspond to disparities, and we randomly choose a pixel from U i and assign a 3D label f 2 I. It typically needs to estimate f by minimizing the following function where l is a coefficient. E p ð f Þ is the cost of pixel p with label f called the data term, which measures the photoconsistency between matching pixels. E pq ð f p ; f q Þ is the cost of pixel p and its neighboring pixels q 2 N ð pÞ by assigning labels f p and f q are called the smoothness term.
(1) Data term For a pixel pðp x ; p y Þ 2 O i , its disparity d p is calculated by a 3D plane d p ¼ ap x þ bp y þ c to avoid the frontal-parallel problem. Therefore, the aim becomes to seek an optimal 3D label f for every pixel in the left and right images, which can minimize the energy function Eð f Þ. The data term of p in the left image is then defined as where W p is a window centered at p with radius of r-pixel (shown in Figure 3(b)). Here, we use the same weight ! ps , as proposed in the LocalExp method. 10 Using 3D label f ða; b; cÞ, the function rðs; w f ðsÞÞ measures the pixel dissimilarity between sðs x ; s y Þ in window W p of the left image and its matching point w f ðsÞ of the right image which is defined as where rI represents the gradient of image I, and is a factor. The two terms are truncated by t col and t grad to increase the robustness for occluded regions. I R ðw f ðsÞÞ is RGB or gray value of the corresponding pixel in the right image.
As shown in Figure 4, for each affine transformation window A i in the left image, the corresponding windows in the right image can be obtained by affine transformation with different 3D labels f ¼ ða; b; cÞ.

(2) Smoothness term
The smoothness term can be defined as the following which is the same as in the literature 10 where pq ð f p ; f q Þ is truncated at t dis , and ! pq is defined as where g is a parameter, and e is a small constant value. The function pq ð f p ; f q Þ penalizes the discontinuity between f p and f q in terms of disparity as where the first term measures the difference between f p and f q by their disparity values at p, and the second term is defined similarity at q. (3) Iterative randomized optimization We use a local expansion method similarly to the Loca-lExp method 10 to randomly iterate optimization based on superpixels for all possible labels 8a 2 I, and I is a 3D continuous space. Here, the binary variable f 0 p for every pixel p is assigned either by its current label f ðtÞ p or a candidate label a.
Our expansion method is also in two ways: localization and spatial propagation. By localization, we use different candidate labels a, instead of using the same label a for all pixels. By spatial propagation, we propagate currently assigned 3D labels to the nearby pixels via GC.
Our local expansion iteratively is shown in Algorithm 1, which also includes propagation (lines 1-4), RANSAC (lines 5-8), and refinement (lines 9-14) steps similarly to the LocalExp method. 10 In addition to the refinement step, the other two steps in our local expansion are the same as those in the LocalExp method. 10 Algorithm 1. Iterative randomized optimization The candidate label a ¼ ða; b; cÞ of pixel rðx; yÞ can be converted to disparity d ¼ ax þ by þ c and normal vector n. For each iteration m, a smaller disparity search range can be computed as where ½d min ; d max is the disparity search range of pixel r, Dd ¼ ðd max À d min Þ=2 m , and m ¼ 1; 2; Á Á Á ; K rand . We randomly select a disparity from the disparity search range d 0 2 ½d 0 min ; d 0 max for each iteration m. In the LocalExp method, all pixels use the full disparity search range, while our method has a smaller disparity search range for each pixel. Therefore, this method has a faster convergence speed than the LocalExp method.
For normal vector, we add a random vector D 0 n of size D 0 n k k 2 ¼ r n to obtain a new normal vector n 0 , n 0 nþD 0 n . Here, different from the LocalExp method, 10 at level l 2 L À 1; Á Á Á ; 1, we should use the angle search range obtained from "Coarse-to-fine disparity constraint and 3D normal vector constraint" section. The angle q between the new normal vector n 0 and the input reference normal vector is firstly computed. Then, we repeatedly compute n 0 with randomly vector D 0 n , n 0 nþD 0 n , until q 2 ½q min ; q max . We initialize r n by setting r n 1, and update it by r n r n =2 for each iteration. Finally, we convert d 0 and n 0 = n 0 j j to the plane representation a ða 0 ; b 0 ; c 0 Þ to obtain a perturbed candidate label. After that, we then update the current labels of pixels p in the optimization window O i , using the current labels f p or the new candidate label a. Therefore, we obtain an improved solution with lower or equal energy as its minimum value.

Coarse-to-fine disparity constraint and 3D normal vector constraint
The above matching is based on continuous 3D label of superpixel, however, the large disparity search range may lead to inaccurate disparity, especially in textureless or low-texture regions. For yielding accurate disparity in those regions, based on our previous coarse-to-fine disparity constraint method proposed by Li et al., 15 we introduce a new coarse-to-fine 3D normal vector constraint.
As can be seen from Figure 5, for each pixel ðx; yÞ at finer level-l, the disparity search range and the angle search range can be computed based on the following steps.
Step 1: Compute x 0 ¼ x=2 b c and y 0 ¼ y=2 b c, and find the pixels with valid disparities around pixel ðx 0 ; y 0 Þ at level-l þ 1, which are expressed as p ul and p ur , p l and p r , and p dl and p dr , as shown in Figure 5(b1) or (b2).
Step 2: Find the minimum disparity d 0 min and the maximum disparity d 0 max among the above pixels. Therefore, for pixel ðx; yÞ at level-l, the disparity search range can be defined as ½d min ðx; yÞ; d max ðx; yÞ 15 where DD is a given disparity margin.
Step 3: Select the normal vector of any pixel in p ul , p ur , p l , p r , p dl , p dr , p u , and p d as the reference normal vector, and then, calculate the angle between the normal vector of other pixels and the reference normal vector. The minimum and maximum angles are, respectively, expressed as q 0 min and q 0 max . Therefore, for pixel ðx; yÞ at level-l, the angle search range ½q min ðx; yÞ; q max ðx; yÞ can be defined as where Dq is a given angle margin.

Experimental results
To evaluate the performance of our proposed method, two datasets are used, that is, Chang'e-3 Yutu rover dataset (http://planetary.s3.amazonaws.com/data/change3/pcam. html) and KITTI (http://www.cvlibs.net/datasets/kitti/) dataset. Our method is compared with the state-of-the-art LocalExp method. 10 We use the following settings throughout the experiments. The parameters for the data term are set to fe; t col ; t grad ;g ¼ f0:01 2 ; 10; 2; 0:9g. The size of window W p is set to 41 Â 41 pixels. For the smoothness term, fl; t dis ; e; gg ¼ f1; 1; 0:01; 10g and eight neighbors for N ðpÞ. The above parameters in our method are set the same as those in the LocalExp method. 10 In the LocalExp method, three structures with cell sizes of 5 Â 5, 15 Â 15, and 25 Â 25 pixels are used, and the iteration numbers fK prop ; K RANS ; K rand g are set to f1,1,7g for the first grid structure, and f2, 1, 0g (only propagation step) for the other two grid structures.
While in our method, the number of pyramid levels is set to 3 for Chang'e-3 rover dataset and 2 for KITTI dataset. For each level, the number of superpixels is set to 1200, which will be analyzed in "Analysis of parameter and processing time" section. The sizes of three windows U i , O i , and A i are adaptively adjusted according to each superpixel. For each superpixel, the iteration numbers fK prop ; K RANS ; K rand g are set to f1, 1, 7g. We iterate the main loop three times (one time only with data term and two times with both data term and smoothness) in our proposed method and eight times (two times only with data term and five times with both data term and smoothness) in the LocalExp method.
In all the experiments, we only use CPU instead of GPU for comparing our method with the LocalExp method. All the experiments are executed on a laptop with Intel Core i5-8250 1.60 Hz CPU and 8-GB memory, and our codes are implemented in Microsoft Visual Studio 2017 with Cþþ and OpenCV library.

Evaluation on chang'e-3 rover dataset
The input Chang'e-3 stereo images are firstly rectified by epipolar rectification method. 27 During the coarse-to-fine processing, we construct three-level pyramid (L ¼ 3) for Chang'e-3 Yutu rover dataset. We choose five stereo pairs from the rover dataset, and the left images are shown in the top row of Figure 6, which contain many low-texture regions. The disparity search ranges of the five stereo pairs are about [À97, À9], [À40, 150], [À20, 119], [3,105], and [À132, 3] pixels, respectively.
We qualitatively evaluate our proposed method compared with the LocalExp method. 10 As can be seen from Figure 6, the disparity map computed by our method (the third row of Figure 6) is superior to that computed by the LocalExp method (the second row of Figure 6). For example, the stereo pair-1 (shown in Figure 6(a)) has huge rock, disparity discontinuous, and many low-texture regions. We can see that our method can obtain high-precision disparity in many low-texture regions, while the LocalExp method obtains false or unreliable disparity in these regions. The other four stereo pairs (shown in Figure 6(b) to (e)) include low texture, repetitive texture, precipice, and disparity discontinuous regions, and especially, there is strong light intensity in the left of Figure 6(d). We can see that all the precipice, low-texture, or repetitive texture regions, varying light conditions, and strong light regions are perfectly reconstructed with our proposed method, while there exist some wrong disparities in these regions with the LocalExp method.

Evaluation on KITTI dataset
Furthermore, due to the lack of standard datasets with the ground truth of the rover's stereo images, the accuracy of our method cannot be quantitatively evaluated. Therefore, we choose the KITTI dataset (http://www.cvlibs.net/data sets/kitti/) created from a driving platform, 29 whose imaging mode is similar to the rover imaging mode, and the sizes of images are 1226 Â 370 pixels. We construct twolevel pyramid (L ¼ 2) for KITTI dataset with our method. In this experiment, we use the whole 194 training image pairs with ground truth disparity maps for reflective regions available, and the evaluation metric is an error threshold 3 pixels, which is the same as the KITTI benchmark. 30 We test the accuracy and efficiency of disparity maps computed by our proposed method, our improved MGM method, 15 and the LocalExp method 10 with KITTI dataset. The quantitative results with different methods are listed in Table 1, which gives the average results of all the 194 training image pairs in reflective regions. In this table, Out-Noc is the percentage of erroneous pixels in nonoccluded areas. Out-All is the percentage of erroneous pixels in total. Avg-Noc is the average disparity error in nonoccluded areas. Avg-All is the average disparity error in total. It is noted that the proposed method performs better than our previous MGM method 15 and the LocalExp method. 10 Our method has a significant improvement in the percentage of erroneous pixels and average disparity error, and the Out-Noc and Out-All are decreased from 12.67%, 13.57% (LocalExp method) to 7.27%, 8.17% (our method) respectively, while the Avg-Noc and Avg-All are all decreased from 2.05 pixels, 2.19 pixels (LocalExp method) to 1.37 pixels, 1.53 pixels (our method). Figure 7 shows several examples of disparity maps, and Table 2 gives the corresponding quantitative results of our previous MGM, LocalExp method, and the proposed method. We can see that there are significant improvements in both percentage of erroneous pixels and average disparity error with our proposed method.
As shown in Figure 7, for low-texture or textureless regions, for example, shadowed regions, roads, and strong light conditions, the LocalExp method tends to generate errors due to nonconvergence. While in disparity maps generated by our method, the errors in these regions are almost eliminated.

Efficiency evaluation
We evaluate the efficiency of our method and the LocalExp method, which are both implemented with CPU. The two Table 1. The average quantitative results with our previous MGM, LocalExp method, and our proposed method for the training images on KITTI dataset using the default error threshold of 3 pixels in reflective regions.

Method
Out-Noc (%) Out-All (%) Avg-Noc (pixels) Avg-All (pixels) Our MGM 15   methods mainly have two computation parts: the calculations of data term of equation (3) and smoothness term of equation (5). For the two terms, they all require O(|W|) of computation for each term, where |W| is the window size. For our method, as shown in Figure 3, |W| is the size of O i in data term, and it is the size of A i in smoothness term. For the LocalExp method, |W| is the size of filtering region M in data term, and it is the size of expansion region R in smoothness term. The computational complexity of our method is estimated by the sum of computation for all the superpixels at all levels While the complexity of the LocalExp method is estimated by the sum of computation for all the cell sizes of 5 Â 5, 15 Â 15, and 25 Â 25 pixels where n 1 and n 2 are the optimization times of our method and LocalExp method, respectively, and they include times of data term and times of smoothness term. Here, n 1 is set to 3 (one time only with data term and two times with both data term and smoothness term) and n 2 is set to 7 (two times only with data term and five times with both data term and smoothness term).
We approximately compare the computational complexity of our method with that of the LocalExp method for five stereo pairs of Yutu rover. Table 3 gives the approximately computational complexity of our method estimated by equation (12) and that of the LocalExp method estimated by equation (13). Finally, the ratio between the complexity of our method and that of LocalExp can be computed. As presented in Table 3, compared with the LocalExp method, the computational complexity of our method is less than 20%.
We firstly compared the processing time of our method against that of LocalExp method with the five stereo pairs of Yutu rover, and the results are shown in Figure 8. Obviously, our method greatly reduces the processing time and improves the processing efficiency. By comparing our method with the LocalExp method, we observe that ours is about six times faster than the LocalExp method. For example, the generation of disparity map for stereo pair-1 with the LocalExp method takes about 3021 s, while it takes about 493 s with our method.
As shown in Figure 9, we compare the processing time of our method with that of the LocalExp method based on   We consider that there are mainly two factors contributing to this acceleration of our method. First, our method with SLIC segmentation makes its convergence faster. Meanwhile, in these experiments, our method iterates three times in the main loop (one time only with data term and two times with data term and smoothness), which can obtain good results, while the LocalExp method iterates eight times (two times only with data term and five times with data term and smoothness). Second, in the LocalExp method, three different structures with cell sizes of 5 Â 5, 15 Â 15, and 25 Â 25 pixels are used, while in our method, the number of superpixels for all levels is set to 1200.

Analysis of parameter and processing time
We evaluate the sensitivity of our method to some parameters. The number of pyramid levels L and the number of superpixels are firstly analyzed in detail. Figure 10 shows the performance of no. 44 image pair in KITTI dataset as an example with different pyramid levels and different number of SLIC superpixels, which contain large untextured regions and shadow regions. We evaluate the disparity maps generated with five different number of SLIC superpixels (400, 800, 1200, 1600, and 2000) and four different number of pyramid levels, that is, (4, 3, 2, and 1). The error rates Out-Noc and Out-All generated with different levels and different numbers of SLIC are shown in Figure 10(d) and (e). The error rate of using two-level is  much lower than that of using one-level, three-level, or four-level. The Avg-Noc and Avg-All with different levels and different number of SLIC are shown in Figure 10(f) and (g). The disparity error of using two-level is much lower than that of using the other levels. Therefore, we choose two-level pyramid. Figure 10(h) gives the processing time with different levels and different number of SLIC. When the number of pyramid levels is fixed, the processing time increases almost linearly with the number of SLIC. Figure 10(i) shows the error rate of Out-Noc and Out-All with a different number of SLIC at two-level pyramid. When the number of SLIC is 1200, the disparity error rate is much lower than that of using the other number of SLIC.
And then, to further verify the relationship between optimization times and matching results, we give different quantitative results under four different optimization times, that is, one time (one time with both data term and smoothness term), two times (one time only with data term and one time with both data term and smoothness term), three times (one time only with data term and two times with both data term and smoothness term), and five times (two times only with data term and three times with both data term and smoothness term). The performance results with different iteration times under the conditions of two-level pyramid and 1200 superpixels are given in Table 4. As the main loop increases from 1 to 5, the error rates of Out-Noc and Out-All decrease from 7.80%, 8.30% to 3.14%, 3.23%, and the average disparity errors of Avg-Noc and Avg-All are getting smaller and smaller, while the processing time is getting longer. When the optimization times reach 5, not only the error rate is increased but also the processing time is increased. Therefore, the optimization times are set to three (one time only with data term and two times with both data term and smoothness term) in our method.
After that, we evaluate the sensitivity of our method to some other parameters. The coefficient l in equation (1), DD in equation (10), and Dq in equation (11) is analyzed based on no. 44 image pair in KITTI dataset. We choose different values of l, 0.1, 0.5, 1, 2, 5, and 10, and the results are given in Table 5. When l equals to 0.5 or 1, it can get better results. However, it consumes more time with larger l. Therefore, we choose l¼ 0:5. The results with different DD and Dq are given in Tables 6 and 7. From Table 6, we can see that it consumes more time with DD increasing, while it can obtain good result beginning from DD ! 1. As given in Table 7, the processing time is almost the same with different values of Dq. When Dq ! 1, we can get a slightly better result. Therefore, we choose DD¼ 1 and Dq¼ 1 for all the experiments in this article.

Conclusion
An accurate stereo matching method based on continuous 3D plane label estimation has been proposed for rover's stereo images. Unlike the previous LocalExp method using three structures with different cell sizes, we propose a coarse-to-fine hierarchal stereo matching method based on superpixels, which makes the convergence faster. The experimental results on Chang'e-3 lunar rover dataset and KITTI dataset show that, compared with the state-of-the-art 3D labeling method, our method can generate more accurate disparity maps, especially in low-texture regions of images.
Although we observe that our method is several times faster than the LocalExp method, the processing time of our method still needs several hundred seconds, which cannot meet the actual application requirements of rover. Therefore, we will implement the parallel processing of the algorithm on GPU processor to meet the speed requirements of practical applications in the future.