Composite clustering normal distribution transform algorithm

Scan registration is a fundamental step for the simultaneous localization and mapping of mobile robot. The accuracy of scan registration is critical for the quality of mapping and the accuracy of robot navigation. During all of the scan registration methods, normal distribution transform is an efficient and wild-using one. But normal distribution transform will lead to the unreasonable interruption when splitting the grid and can’t express the points’ local geometric feature by prefixed grid. In this article, we propose a novel method, composite clustering normal distribution transform, which comprises the density-based clustering and k-means clustering to aggregate the points with similar local distributing feature. It takes singular value decomposition to judge the suitable degree of one cluster for further division. Meanwhile, to avoid the radiating phenomenon of LIDAR in measuring the points’ distance, we propose a method based on trigonometric to measure the internal distance. The clustering method in composite clustering normal distribution transform could ensure the expression of LIDAR’s local distribution and matching accuracy. The experimental result demonstrates that our method is more accurate and more stable than the normal distribution transform and iterative closest point methods.


Introduction
In unknown environment, simultaneous localization and mapping technology (SLAM) 1 is the cornerstone for the autonomous and intelligence of mobile robot. 2 Scan registration is a fundamental step for SLAM and autonomous navigating. 3 Surrounding information of robot is obtained by distance sensors, such as LIDAR or depth camera. Scan registration algorithm first establishes the association of two near LIDAR frames, then constructing the objective function. Finally, it will be optimized to get the transform between two LIDAR frames. With continuing scan registration, the environment model around the robot is built.
At present, there are two mostly widespread scan registration algorithms. The first one is based on iterating the closest points. The objective function is constructed by the geometric constraints of points, lines, and surfaces between two frames. The typical algorithm is iterative closest point (ICP). 4 The second one is based on the probability density function to build the relation. The objective function is built by the probability distributing model of current frame in reference frame. Its typical algorithm is normal distribution transform (NDT). 5 The main idea of this article is based on NDT method.
In this article, by analyzing the characteristic of NDT, we find that when fitting the local region of LIDAR data, NDT method will lead to unreasonable interruption while splitting points by normal distribution. Also NDT can't express the local geometric features by prefixed grid cells. A new scan registration algorithm based on composite clustering is proposed in this article, in which the points with similar local feature in the reference frame are aggregated. The points are clustered by density-based spatial clustering of applications with noise (DBSCAN) and k-means method. DBSCAN is used to aggregate the region with continuous point distribution in reference frame. Then, the unsuitable clusters are selected by singular value decomposition (SVD)-based method and subdivided again by k-means if this cluster is unsuitable for normal fitting. The bounding box of each cluster is built for search and index. Finally, objective function of all matches is formed by normal distributing function and then optimized by trust region method.
The main contributions of this article are as follows: (1) We propose a novel clustering method to aggregate LIDAR points by their local features, which is a combination of density-based clustering and kmeans clustering. (2) To judge a cluster's suitable degree for NDT matching after DBSCAN, we propose a ratio factor based on SVD decomposition to get a cluster's singular values. (3) We propose a novel distance measuring method based on trigonometric to evaluate the neighbor LIDAR points, which eliminate the LIDAR's radiation phenomenon.
This article is organized as follows. In "Related work" section, we discuss the related work about scan registration. In "Problem statement" section, the basic procedure of NDT is narrated, and the disadvantage of original NDT method is discussed. Section "Composite clustering scan registration" is the main text of this article, and the overall procedure of composite clustering NDT (CCNDT) is narrated in detail. In "Experiments" section, the comparing experiments among CCNDT, NDT, and ICP are designed and discussed. The last section is a brief conclusion of this article.

Related work
LIDAR is a main exteroceptive sensor in mobile robot 6 when acquiring distance data from the robot's surrounding. In contract with wheel odometer and IMU, LIDAR information represent the correlation between robot and environment. With the map built by LIDAR data, robot can realize where it locates and where it is the destination and path. Scan registration is a method to get the transform between LIDAR frame, which is critical for robot.
Scan registration aims to construct the relation of two LIDAR frames by matching their points, then building the objective function and optimizing. The most representable scan registration methods are ICP and NDT methods. ICP is first proposed by Besl and Mckay,4 in which the frame relation is built by matching the closest point pairs. But without additional information such as geometric characteristics, the matching is worse when there is a large transform. Then, Segal et al. 7 propose a general ICP method using the local inner points to get the face structure, which makes the result more accurate. Serafin and Grisetti 8-10 use the surface characteristics around the points to search for ICP corresponds. Cox and colleagues 11 extract line features from LIDAR data and match the points with relevant lines. Iterative dual correspondence method 12 is used to improve matching accuracy by maintaining two relevant sets. Wang et al. 13 come up with a multilayer matching method to deal with the data-association problem, in which the uncertainty of the matching results is described by Fisher information matrix. MbICP 14 is designed to improve convergence with large initial orientation errors, which explicitly put a measure of rotational error as part of the distance metric. Those ICP methods are easy to deploy and use, but their two weaknesses 15 are considerable: the nearest match is not correct matching in many conditions and the computing cost is very heavy when searching the nearest points.
To avoid the heavy matching of ICP, Biber and Strasser 5 come up with the probability-based method named as NDT (normal distribution method). It divides the reference frame into grid cells. Then, the normal distributing parameter is calculated for each cell. Next, each point of current frame is matched with a grid by their coordinate to get the probability density. With this method, it needn't match each point but the grid cells to reduce computational cost. Moreover, normal distribution can provide the geometric representation of points, which gives the higher order derivatives for optimizing. 16 Based on NDT, Gutmann and Konolige 17 registrant current frame with serval previous frame to improve precision. Mitra et al. 18 use quadratic approximants to get the distance from target surface, describing the model surface implicitly. Ulas and Temeltas 19 come up with a featurebased NDT method. Kunjin Ryu et al. 20 use ND-to-ND method to get global matching, which can not only provide the internal frames transform but also be used for loop closure detection 21 for graph optimization. Gouveia et al. 22 use the multithreading to parallelize the data fusion step. Even more, Bosse et al. 16 match all the LIDAR frames by registering the subregions and global map. And in kinect fusion by Newcombe et al., 23 the relationship is built by a global optimizing model, but it is too heavy and slow for most calculating platforms. There are also many other scan-matching methods such as expectation-maximization (EM) matching method by Thrun et al. 24 and the graph model method by Feng and Milios. 12

Problem statement
Compared with other scan registration methods, NDT has the following two advantages when building the correlation: (1) The normal distributing model gives the points in one cluster a common attribute, which reduces the quantity in matching. (2) Compared with the error function of point distance, the normal distributing error function has higher-order derivation, which makes it possible to use the optimizing method with the secondorder gradient.
The scheme of NDT includes three parts, as shown in Figure 1. Firstly, the points in reference frame are divided into grid cell and constructed the normal distribution of the points in each cell. Then, the points of current frame are linked with those cells by their relevant coordinate. Finally, the error of those links is minimized to get the transform. Next, we'll have an elaborate narration of the inference.
When the robot moving, distance points are obtained from LIDAR.
É individually represent the current and previous frames indexed by m; m À 1, and the point number is denoted by i. Scan registration aims to get the coordinate transform D m!mÀ1 ¼ x D y D q D ½ T , where x D ; y D is a plane transform and q D is rotation. In this article, we use "reference frame" to represent the previous frame P mÀ1 .
In NDT method, the reference frame is first divided into grids, as shown in Figure 2. In advance, it needs to get the grid span in vertical and horizontal. Then, the frame is divided into span distance and cell interval. The values of span distance and cell interval are manually tuned.
Each point will be filled into cells according to their inclusion relation. Then, each cell is represented by normal parameter. For instance, the parameter of points in one cell is where ði;jÞ z mÀ1 is the mean of the points in grid cell i; j ð Þ, and the sign i; j ð Þ on the top left is the grid cell's index, which means the cell located at ith row and jth column. The sign m or m À 1 on top right is time index of LIDAR data. p k is the coordination of point k. D m is transform parameter to be optimized from pose m À 1 to m. And S is covariance of the cell. And T p; D ð Þ is a transform operator as explicitly shown as Here the coordinate of p is x p ; y p ; q p h i , and the coordi- With the reference frame has been represented by normal distribution, the point in current frame can be linked to the reference cells. The current frame is transformed to the coordinate of reference frame with the initial value from Odometer or IMU. If no input source, initial value will be set as D m . Schematic illustration of NDT method. When a robot moves from x 1 to x m , its LIDAR sensor will capture the LIDAR points P i of surrounding in each pose. And NDT method will build the relationship between points and grid cells of two neighbor LIDAR frames, such as P mÀ1 and P m . Then, the transform relation will be calculated by objective function. NDT: normal distribution transform. frame is linked to a relevant grid cell by coordinate. The point's error function is obtained by normal distributing density function where f is a normal distributing probability density from points k to grid cell i; j ð Þ. And the overall objective function is The gradient and Hessian of objective function are calculated to optimize the displacement of the two robot poses, which will be deduced in "Range-based decomposition" section. Then, Newton method is used to optimize the transform parameter Here, g m is the gradient of overall objective function, and H m is its Hessian matrix.
The inference above is NDT's scan-matching procedure. It is ingenious to use normal distribution to represent the reference LIDAR frame. Thus, the optimizing step could utilize the normal probability density function to get the first-order and second-order derivatives.
But while using NDT method in real LIDAR data, it is found that the points should belong to one cluster according to their local feature, but they are split unsuitable by prefixed grid boundary, which make points aggregate to the wrong cluster.
As shown in Figure 2, the LIDAR frame should be divided into 20 local sets according to its local shape but resulted into 31 by the fixed interval division of NDT, where the unsuitable split is 35.5%.
As shown in Figure 3, the left figure illustrates the points are divided based on their local distributing feature, the set's center is same as the points' centroid. Then, as shown in the arrow of Figure 3(a), the new point will get good match with the cluster. But the ineligible division in Figure 3(b) will lead to unreasonable matching, where points are divided by the prefixed boundary, like NDT method, and the point after matching can't link with the target cluster's centroid. So, the precision of NDT is limited by the prefixed grid boundary, which can't adapt to the local distribution's variety.
The core thought of our method is motivated by the disadvantage of ineligible grid splitting of NDT. To deal with this problem, we compose a new point dividing method based on composite clustering, which can adapt to the local distribution of LIDAR points. In this way, the accuracy of scan registration is improved as shown in the "Experiment" section.

Algorithm overview
The scheme of CCNDT method is introduced summarily, of which the core goal is to get the transform between two robot poses by their LIDAR and Odometer data. As shown in Figure 4, the whole process includes five steps: (1) It is started from the reference frame, where the inputted reference LIDAR data are first processed by the density clustering (see "Local distributingbased clustering" section). And the trigonometricbased distance method (see "Distance measuring method-based on trigonometric function" section) is used to measure the interval between each point. The reference LIDAR points are clustered by their local distributing shape. (2) The outputted clusters from (1) are filtered by SVD method to judge their eligible degree. Otherwise, the ineligible clusters will be further divided (see "Range-based decomposition" section). (3) After the filter process, the ineligible cluster will be divided into k's small cluster by k-means method. The value of k is proportional to the ratio scale of this cluster's singular values (see "Rangebased decomposition" section). (4) Each clustering result will be given a bounding box for matching. Those points in current frame can be mapped into the bounding box by their relative coordination (see "Registration and objective function" section). (5) With the links from current frame's points to reference frames' clusters, the objective function is obtained by gathering each link's normal distribution error. The transform between these two LIDAR frames is calculated by the objective function (see "Optimization and solving" section).
In the next several sections, we will walk through each of those steps in detail.

Local distributing-based clustering
The characteristic of LIDAR points should be analyzed firstly. LIDAR is a surveying sensor, as shown in Figure 5, which measures distance to a target by pulsing laser light and measuring the flying time from launching to receiving the reflected pulses by a laser receiver. LIDAR will record each measurement's angle and distance when rotating. So, the obstacle in the laser illuminating path will be captured.
As shown in Figure 5, LIDAR points captured from the wall-like shape will form a straightline LIDAR distribution and from the cylinder shape will form an arc distribution. To make those points from same region clustered together and from different regions separated, it needs to get a method to identify the similarity and otherness among each object.
As the continuous of object boundary, the LIDAR point from the same obstacle is more continuous and has a higher reachable density than other areas. This condition is suitable for the density-based clustering named DBSCAN, 25 in which the points from the same cluster have a higher density than from the different clusters. DBSACN provides a model named "Density Reachability," which selects points satisfied the distance thresholds into one cluster. It is good at clustering the points in arbitrary distributing shapes.
In DBSCAN, D mÀ1 1 is an initial cluster, and put into a father point p mÀ1 x , which is randomly fetched from the reference frame P mÀ1 . Then it is going to find out all the points in P mÀ1 satisfied with the limitation: where b is the distance threshold. In "Distance measuring method based on trigonometric function" section, we will introduce a new method to initialize and calculate this threshold.
And the points satisfied with the limitation are put into set D mÀ1 1 ; meanwhile, removed from P mÀ1 . After ending a traversal round, the next turn starts with all the points from the last turn as the father points of this turn. Then, it continues searching until no more new point added in during a Where the inputs are current frame and reference frame, the output is the transform between two LIDAR frames. The reference frame is processed by DBSCAN clustering, SVD selection, and k-means clustering. Then, the normal distributing parameters are calculated and matched with the current frame by bounding box. Finally, with the matching indexes, the transform between two LIDAR frames can be calculated. CCNDT: composite clustering normal distribution transform; DBSCAN: density-based spatial clustering of applications with noise; SVD: singular value decomposition. new turn, as shown in Figure 6. So far, a density cluster D mÀ1 1 is formed. For the left points in P mÀ1 , the above iterating procedure will be repeated until emptying P mÀ1 . Finally, a series of Each of those clusters has two characteristics. (1) The points in one cluster are density connected within threshold b.
(2) All of the density reachable points from one cluster must belong to this cluster.
If the points in one cluster are too few N D mÀ1 i < z, it will be improper for normal distributing fitting, where N D mÀ1 i is the number of points in cluster D mÀ1 i , and z is an adjusting parameter depend on the resolution of LIDAR. When the LIDAR's angular resolution is 0.25 , we set z ¼ 3. So, these clusters will be deleted and their points will be collected into a special cluster named D mÀ1 0 . Then the series of clusters become D mÀ1

Distance measuring method based on trigonometric function
After clustering LIDAR points by DBSCAN, we find a disappointed weakness. As the radiating phenomenon of LIDAR, the LIDAR points from the straight facet would not be uniformly distributed in a line. The LIDAR point captured on the same object from different distance will be sampled in different interval, as shown in Figure 7. The nearby interval is close and the further interval is distant. So, the threshold b is unsuitable to be a constant when measuring the points interval.
Here, we use a new method based on trigonometric function to measure the density reachability. When starting a new cluster, there is only one initial father point in the cluster. So, it just needs to consider the distance. And the where Dq is the LIDAR's angular resolution, and r 0 is the distance of father point, as shown in Figure 8. After the initial procedure, the value of angle a can be obtained by the law of cosines with the initial points where the formula is based on the law of cosines, and l p 0 p 1 ; l op 0 ¼ r 0 ; l op 1 ¼ r 1 is the edge of triangle Dop 0 p 1 . Then, in triangle Dop 1 p 2 with two angles and one edge known, it is easy to calculate other parameters with the trigonometric function. We can get the distance method based on the trigonometric relation where b means that if the third point is collinear with the two initial points, the distance of p 1 p 2 should be l p 2 o in theory, as shown in Figure 8.
where Q LIDAR is the covariance of LIDAR's accuracy, and the covariance can be obtained by calibrating the   LIDAR. And l is the adjustment factor for LIDAR's physical noise.
As an example, to show the result after density clustering, one LIDAR scan of the ACES building is tested, which contains 1081 points. And after applying the density clustering and our trigonometric distance method, the LIDAR points are divided by their local distribution properly, as shown in Figure 9.

Range-based decomposition
After DBSCAN processing, the continual regions of LIDAR data have been aggregated together, as shown in Figure 9. But on the one hand, as the larger cluster has a larger bounding box, so a point may be included by this bounding box but far from the cluster's center, which will lead to heavy distorted links. On the other hand, the larger cluster has a strong attraction for its nearby points than a smaller cluster, which is because the weight of one cluster is in direct proportion to its quantity of points. And it will make the relaxing iteration unbalance. So, the result of DBSCAN is unsuitable for matching till now.
It needs to further decompose the larger cluster. The first question is how to select out the unsuitable clusters. It is noticed that such kind of clusters not only have a wider boundary but also have a big disparity of primary and secondary eigenvalues. So, each cluster in D mÀ1 È É first be performed with SVD to get its singular values where i S mÀ1 is the covariance of D mÀ1 The ratio from the biggest to the second singular value can denote the ineligible degree of one cluster.
where s 1 and s 2 are the primary and secondary singular values of the cluster's covariance, respectively. x is the ratio factor. If the ratio factor is bigger than 2, it means the larger cluster should be divided into two smaller clusters at least. Otherwise, it doesn't need to be divided. So, if x > 2, the cluster's shape is disproportion and unsuitable for normal fitting. Then, it is necessary to get a further division for those unsuitable clusters. The second question is how to further divide the unsuitable clusters. Those larger clusters can be divided by k-means method 26 into k smaller clusters. k-Means makes their inner distance of one cluster minimum and their outer distance maximum. The unsupervised learning method of k-means can adaptively select the coordinate of initial points. And with the Euclidean distance, the shapes of clusters outputted from k-means are suitable for normal fitting.
And its most important parameter is the clustering quantity k. Here, we use the previous SVD result to calculate the value of k where k mÀ1 i is the clustering number of the ith cluster in the m À 1 LIDAR frame. x is the ratio factor.
The result of k-means is where D mÀ1 i is decomposed into k clusters D mÀ1 i;1 ; n D mÀ1 i;2 . . . . . . D mÀ1 i;k g with Euclidean distance method dis ¼ jjp i À p j jj.
The cluster with few numbers of points also needs to be removed and puts into D mÀ1 O È É as mentioned in "Local distributing-based clustering" section. The cluster D mÀ1 O È É is useful when a point can't match with any cluster, then the points in D mÀ1 O È É can be an optional candidate to build the matching relation.
As an illustration, we select the unsuitable clusters in Figure 9 by SVD ratio factor, then applying k-means method to divide those unsuitable clusters. The change from Figure 9 to Figure 10 demonstrates that the unsuitable clusters are decomposed into smaller clique by k-means. Each of the clusters as the box marked in Figure 10 is more balance in each direction, which is more suitable for normal distribution fitting.
Till now, there are two groups of the cluster: the first group is the suitable results of DBSCAN and the second group is those unsuitable clusters decomposed by k-means. Then reorder them as For uniform, those two groups of clusters are integrated together and denoted as This step just changes the index of the clusters for their lower right corner.

Registration and objective function
After obtaining the proper clusters, here it is going to inference their matching parameters. As the origin NDT method inferred in "Problem statement" section, the normal distributing parameter of each cluster includes its mean and covariance. The mean of cluster D mÀ1 where i p mÀ1 k j k¼1;2...n f g 2 D mÀ1 i , and i is the index of cluster. And its covariance is When building the matching relation, the original NDT method uses grid cells to link the points and the clusters. But in CCNDT, the cluster's size and position are not regular, so here we use the bounding box as shown in Figure  11, for searching. The bounding box B i is denoted by two points' coordinates, the upper left corner and the lower right corner.
To judge whether a point belongs to this cluster, it just needs to compare the point's coordinate with the bounding box's two corners.
Till now, the relation between points in current frame and the clusters with bounding box in reference frame are constructed.
When the new LIDAR frame P m obtained, named current frame, we need to match each point in P m to their relevant cluster inD   (23) and (24).
judge whether p m i is included. All the sanctified box of p m i is B i ¼ fB k jB k 3 p m i g. The objective function of p m i to its relevant cluster B j ; jjB j 2B i n o is represented as the normal distributing probability density where h norm is normalizing coefficient as equation (6). i e m k is the error from the kth point in frame m to the cluster All the points' objective functions are gathered as which uses the logarithm to convert multiplication into sum, the inner summary layer is the errors of one point i with all of its linked clusters B j 2B i , the outer layer of sum is the errors of each point in P m .

Optimization and solving
To solve the objective function, the Jacobin g m and Hessian H m of objective function are needed first.
where Jacobin i g m k and Hessian i H m k are calculated from the objective function of one matching relations.
For each point's PDF, the gradient is where D m ¼ x y q ½ T is the increment of robot pose.
After substituting the normal parameters, the objective function is reshaped as In which the Jacobin matrix is obtained by which is the partial derivative of the error function i e m k with respect to each element of D m .
The Hessian matrix is the second derivation of objective function where l and r are the point indexes of the first-round derivation and second-round derivation, respectively. With objective function ubstituted in, Hessian is reshaped as In which the second derivative of i e m k for D is equivalent to the derivative of i J m for D.
where the result is 6 Â 3 matrix, almost elements are 0, expect the elements in position 5; 3 ð Þ and 6; 3 ð Þ. With the deduction above, we have got the overall objective function f m , gradient g m , and Hessian matrix H m . There are many methods to optimize this objective function. As Tyler fitting can only get the high-quality fitting nearby the initial point, the linear searching methods, such as the gradient method and newton method, are not suitable for the nonlinear optimizing condition.
In this article, the trust region-based method 27 is adopted to solve the objective function. This method evaluates the fitting condition by comparing the ratio of expected improvement from the model approximation and the actual improvement from objective function. It is a criterion to decide whether to expand or shrink the forward step. It evaluates the fitting condition with where the denominator is the actual improvement, which is calculated by the second-order Taylor expansion of object function f m . The numerator is the expected improvement, where f m P m þD m ð Þ and f m P m are the results of objective function at P m þ D m and P m , respectively. J P m and H P m are the Jacobin and Hessian of objective function f m , respectively.
With the fitting condition r, the forward step is larger in the well-fitting place and shorter in the worse. So, with the trust region-based method, the fitting is more flexible and adaptable.

Experiments
To evaluate the result of our method in real robot data, we use public data sets to test the mapping quality of our method. The first data set is from Intel Research Lab in Seattle, 28 which was recorded by a Pioneer II robot equipped with a SICK sensor and an odometer. The area of this data set is about 26 Â 26 m 2 and the robot traveled 506 m. The second data set was from the ACES building at the University of Texas provided by Patrick Beeson, 29 which includes 13,632 LIDAR frames.
To illustrate the final mapping results, we use the Karto SLAM 30 as a framework, in which the scan-matching parts are replaced by our CCNDT method to calculate the transform between two neighbor frames. It first takes the odometer data as the initial guess. Then, the output transform is aggregated into the Karto SLAM to generate the occupied grid map, as shown in Figure 12, which is deployed in ROS Kinetic environment. In addition, as the scale of the environment is very large and including many trajectory loops, the Karto SLAM is also used for loop closure matching and global optimization with sparse pose adjustment (SPA) method. 31 From the mapping result in Figures 12 and 13, it is clear that the maps built by CCNDT have high-quality and clear edges. Even in large environment, the mapping result still has very little distorted, which partly is contributed by Karto SLAM's loop closure and SPA optimization.
As there is no absolute ground truth in Intel Research Lab and ACES data sets, Burgard 32 provide a manual calibrating method to align the robot path with LIDAR frames, which perform a pair-wise registration to refine the estimates. This calibrated data were accessible online. 29 Then, we use three different methods, NDT, ICP and CCNDT, to process the LIDAR data set frame by frame. The output results are the transforms of each neighbor frame. They are compared by the metrical benchmark method 33 with the manually calibrated data. As the time stamps of scan matching and manual calibration are not one-to-one matched, so the calibrating series are interpolated to be synchronic with the scan-matching series, which will be regarded as the ground truth where D m is the scan-matching result, D inter is the manually calibrated transform sourcing from Kümmerle et al., 33  and e is the error between D m and D inter . T a; b ð Þ is a transform operator as equation (4).
The comparing results of each method are shown in Figure 14.
After statistical analysis for these error results from Figure 14, we got the mean and covariance of the error results in Table 1.
From Table 1, the mean error of CCNDT is 0.1879 m, which is lower than NDT 0.2726 m and ICP 0.2619 m. The sum error sources from three parts: the manually calibrating error, the scan-matching error, and the linear interpolating error by metrical benchmark method. Besides, the covariance of the CCNDT is 0.0295 m 2 , which is lower than the other two methods. Experimental result demonstrates that our method is more accurate and more stable than NDT and ICP methods.
Comparing the speed, CCNDT method processes the total data set costing 2791 s, which is obviously slower than NDT and ICP. This is because CCNDT method spends too much time in clustering procedure. In terms of matching speed, CCNDT method is not suitable for the robot with lower computing capability, but if the computing capability is sufficient, it is recommended taking CCNDT to get a better matching quality.

Conclusion
In this article, we compose a novel scan registration method, CCNDT, which based on normal distributing transforming method. In CCNDT, a composite clustering method, combining DBSCAN and k-means, is used to get the suitable splitting of LIDAR data. This method can adapt to the local distribution of LIDAR points. Meanwhile, we come up with a new distance measuring method based on trigonometric function, which can avoid the radiating distribution of LIDAR points.
We test CCNDT method with the public data set from Intel Research Lab and ACES building at the University of Texas. The experimental results indicate that CCNDT is more accurate and more stable than NDT and ICP methods. And the point clusters of CCNDT could ensure the continuity of the local region and get the eligible matching from points to clusters.
In our future work, we are going to deploy the proposed CCNDT method on the scan-to-map registration, because the CCNDT can express the feature of local point distribution and decrease the number of reference points by replacing them with clusters.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Figure 14. The comparing results of CCNDT (red dot), NDT (blue), ICP (green), where the horizontal axis is the pose index of each comparing pairs, which include 1774 pairs, and the vertical axis is the error calculated by the metrical benchmark method; the red dots (CCNDT) are located at the lower part than green (ICP) and blue (NDT) dots. CCNDT: composite clustering normal distribution transform; ICP: iterative closest point.