Upgraded trajectory planning method deployed in autonomous exploration for unmanned aerial vehicle

Autonomous exploration is grounded on target decision and trajectory planning, which is widely deployed on unmanned aerial vehicles. However, existing methods generally only focus on the exploration effect of target decision but neglect the environment information gained with trajectory planning during flight, resulting in redundant exploration trajectories and low exploration efficiency. This article proposes an upgraded method of trajectory planning for autonomous exploration work. We design a fresh cost term considering the frontier information in the part of trajectory optimization. Besides, yaw angles are planned independently to catch more environment information during flight. We present extensive simulations and real-world tests. The results show that our proposed method reduces the exploration cost time by 10–15% compared with the previous one.


Introduction
Unmanned aerial vehicles (UAVs), especially quadrotors, are popular for their agility and ability to work at heights. Therefore, they are widely used in various fields, such as search and rescue, intelligence reconnaissance, and so on. In the above applications, autonomous exploration work plays an important role. As Figrue 1 shown, autonomous exploration requires UAVs to make decisions for local targets to complete, plan effective trajectories to reach, and get the global environment information to map.
Exploration planning methods have recently received increased attention. Autonomous exploration can be divided into three parts, namely viewpoints decision, global tour path planning, and trajectory planning and optimization.
Viewpoints decision means robots autonomously determine the next best viewpoints (NBVs) (obtained position and yaw angle) that can gain the maximum environment information according to the current local map. Some of the previous work only focus on the local exploration speed as fast as possible. Yamauchi 1 first introduces a new approach for exploration based on the concept of frontiers and simply treats the nearest frontier point as a local target. Cieslewski et al. 2 extend the classical frontier-based exploration method which enables exploration at high speeds. It selects a goal frontier in a way that minimizes the necessary change in velocity, making the total exploration time shorter at the cost of a longer path. Above methods [1][2][3] have been proposed to improve the efficiency of autonomous exploration. Among them, exploring motions planners decide on the next viewpoints in different manners, such as greedily navigating to the closest unknown region for maximizing the immediate information gain 1 or refining a local set of viewpoints for a more efficient tour path. 3 Yu et al. 4 further propose a TSSEAE (Two-Sided Cross-Domain Collaborative Filtering model based on Selective Ensemble learning considering both Accuracy and Efficiency) method based on the work 5,6 to convert the problem of selecting combinations to that of selecting classifiers, which provides new ideas for clustering and classification of frontiers. In fact, the concept of NBVs has proposed a long time ago, which was explained as a good sequence of range-image views for obtaining a scene as complete as possible it can. 7 Bircher et al. 8 first used NBVs in 3D exploration work in which it expands rapidly exploring random trees (RRTs) with free space. Based on it, some extensive work was proposed, which considered the visual importance of different objects, 9 the uncertainty of localization, 10 and inspection tasks. 11 In the classic frontier-based exploration method, Zhou et al. 3 recognize frontiers as known-free voxels right adjacent to unknown voxels and generates candidate viewpoints for a frontier cluster (black circle in Figure 2). Note that, unlike previous approaches which simply navigate to the center of a cluster, points are sampled uniformly at the average position of the cluster. For each of the sampled points lying within the free space, with the yaw optimization method similar to Witting et al., 12 the yaw  angle is determined according to the maximizing sensor coverage to the cluster by using. In this article, we follow this strategy 3 to select and decide NBVs.
After getting a series of NBVs, the crux of the problem becomes how to plan a path as short as possible to connect the above points. This is what path planning does and is generally modeled as a traveling salesman problem (TSP).
Inspired by Meng et al., 13 Zhou et al. 3 reduce it to a standard asymmetric TSP (ATSP) and recompute an open-loop tour starting from the current position and passing viewpoints at all clusters. Nonetheless, the initial tour path involves only a single viewpoint of each cluster, which is not necessarily the best combination among all viewpoints. Therefore, by using the graph search approach, Zhou et al. 3 optimize the previous global tour path to further improve the exploration rate.
Discrete viewpoints and rough global tour paths can't be used for UAV motion execution, so continuous trajectories are required for smooth navigation, which is the work of trajectory planning and optimization.
Quadrotor online planning methods require a faster generation of a more safe, smooth, aggressive trajectory. Among the methods proposed, the local trajectory planning method based on the gradient information becomes the mainstream for UAV local trajectory generation, for its simple but efficient function to smooth a trajectory and improve its clearance. 14 Euclidean signed distance field (ESDF) is first introduced in robotic motion planning by Ratliff,15 which calculates the information of distance to the closest obstacle. Ratliff et al. 15 and Kalakrishnan et al. 16 further proposed discretization optimization of trajectory based on ESDF but is not applied to drones for its dynamical constraints. Therefore, Oleynikova et al. 17 proposed a continuoustime polynomial trajectory optimization method for UAV planning. However, it causes a heavy computation burden because of the involved integral of the potential function and has a low success rate of around 70%. Gao et al. 18 significantly increases the success rate by initialing a collision-free path as the front-end. And the performance is further improved after taking the initial collision-free path into account kinodynamic constraints. 19 A robust and efficient quadrotor motion planning method 20 for fast flight in 3D clutter environments was adopted by the frame. 3 It adopts a kinodynamic path searching method to find an initial trajectory and improves the smoothness and clearance of the trajectory by a Bspline optimization using ESDF information. Zhou et al. 21 proposed an ESDF-free gradient-based planning framework. Its collision term in the penalty function is formulated by comparing the trajectory in collision with a collision-free guiding A* path instead of ESDF, making the computation time reducing mostly. Unfortunately, the planning trajectories are not suitable for autonomous exploration because of the instability caused by aggression, which acquires an extremely high demand for control tracking. Once tracking is poor, collisions and even crashes may occur, which can be fatal to autonomous exploration. Therefore, in this article, our planner takes inspiration from the work 20 at a cost of more planning time in exchange for security, and takes consideration of unknown environment information into cost function in trajectory optimization, which makes the planner better for autonomous exploration. Besides, we change the way of yaw planning to keep the field of view (FOV) of UAVs heading to an unknown area as possible it could. However, the above planning methods ignore the critical role of the yaw angle in the acquisition of environmental information as it is only applied in purely autonomous flight. But it is important to note that yaw is the window for environment-UAV interaction in autonomous exploration, which is focused on in this article.
Benefiting from the flatness of the quadrotor dynamics, 22 yaw can be planned independently of the 3D position. However, since autonomous navigation work only has strict requirements on position for obstacle avoidance, the current trajectory planning work is simplified for yaw planning to improve efficiency.
A representative local planner 21 obtains the yaw angle by sampling the position of the neighboring trajectory points and solving the yaw by the arctangent. And the yaw will be parallel to the UAV's velocity. Another local planner 20 is almost the same as Zhou et al., 21 but the difference is that Zhou et al. 20 impose a strict limit on the yaw at the end. Based on the yaw limit, the classic framework for autonomous exploration 3 adopts the method 20 in trajectory planning and can use the yaw information of the viewpoints.
The above work has completed autonomous exploration to a certain extent and improved exploration efficiency. However, existing methods ignore the role of the flight process to viewpoints but only focus on the viewpoint environment gain, resulting in repeated inefficient planning and redundant trajectory. It wastes the positive function of trajectory planning by reducing the exploration cost time and making exploration less efficient.
Motivated by the above facts, we implemented a smarter method in trajectory planning, which is based on a hierarchical framework 3 using an incremental frontier structure. By designing a fresh cost term called environment cost in the step of trajectory optimization and planning yaw angles independently with a more flexible strategy, we maximize the gain of environment information during flight.
We have evaluated our method with previous work 3 in simulation with unknown maps, showing that in all cases our method could finish autonomous exploration work in a shorter time. Furthermore, we have verified our method in a cluttered environment in the real world, proving that it can be stably and efficiently applied to real-world scenarios. The contributions of our work are summarized as follows: 1. A new cost term in trajectory optimization, which optimizes a tour trajectory by gaining more environment information in flight. 2. A yaw planning strategy, which simply gets a set of yaws heading to the unknown area and reduces waste of exploration resources. 3. Extensive simulation and real-world test that verifies our method. The results show that this method can effectively reduce the exploration cost time by 10-15%.
The article is organized as follows: The first and second sections describe the introduction and overview of the autonomous exploration frame. The third section explains a fresh environment cost term in trajectory optimization. The fourth section explains a creative yaw planning strategy. The fifth section describes the experimental verification including simulation and real-world. The final section is the conclusion of our work.

Frame overview
In this article, we refer to the hierarchical framework proposed by Zhou et al. 3 and follow its processing steps to improve autonomous exploration work.
As illustrated in Figure 3, the framework is composed of an incremental update of the frontier information structure and a hierarchical exploration planning approach. Once the map is indicated by the grid, the framework will detect whether any frontier clusters are changed and update the frontier information. After ensuring the sets of viewpoints by frontier clusters, it will plan a global exploration tour path, optimizes local viewpoints, and generates the trajectory navigating to the best viewpoint. When there is no frontier on the map, the exploration is considered fully completed.
As Figure 4 shown, we will focus on the work of trajectory planning, which includes trajectory optimization and yaw planning, after receiving the selected viewpoint information.

Upgraded trajectory optimization deployed in exploration
In this section, we will reveal our work that sets a new environment cost to make UAV gain more environmental benefits during the flight.
The framework we referred to Zhou et al. 3 generates smooth, safe, and dynamically feasible B-spline trajectories and further optimizes all parameters of B-splines based on the method. 20 The method 20 uses the classical front-end and back-end planning way to finally generate a high-quality trajectory, which is a polynomial in mathematical essence. The hybrid-state A* search, which is first applied to autonomous vehicles, 23 is used for the front-end kinodynamic path searching. It searches for a safe, kinodynamic feasible but suboptimal trajectory that is minimal concerning time duration and control cost in a voxel grid map. Unfortunately, the path is so close to obstacles that can't be executed due to the danger. So it's necessary to improve the smoothness and clearance of the path in the back-end trajectory optimization.  A B-spline is a piecewise polynomial uniquely determined by its degree p b , a set of N þ 1 control points fQ 0 ; Q 1 ; Á Á Á ; Q N g, and a knot vector ½t 0 ; For a uniform B-spline, each knot Dt m ¼ t mþ1 À t m has the same value. This is the basic principle of the B-spline representing polynomial trajectories. Our work focus on the cost function in optimization, so we recommend readers to read 20 for more details.
In our work, the optimization problem which bases on Zhou et al. 3 and creatively adds a fresh environment cost term f e is formulated as follows where l s , l c , l d , l bs , w t , l e are the weights for each cost term. f s is the band smoothness cost similar to Quinlan and Khatib 24 and Zhu et al. 25 This formulation views a trajectory as an elastic band, 24 where each term F iþ1;i ¼ Q iþ1 À Q i and F iÀ1;i ¼ Q iÀ1 À Q i are the joint force of two springs connecting the nodes Q iþ1 ,Q i and Q iÀ1 ,Q i respectively.
The collision cost f c is the important term to force the trajectory to avoid a collision, which is formulated as the repulsive force emerged by the distance to the closet obstacle acting on each control point where dðQ i Þ is the distance information given by ESDF. A d thr can be set to specify the threshold of obstacle clearance, so the differentiable F c can be defined as follow As the quadrotor dynamics are differentially flat, 22 we can ensure the feasibility by restricting the higher order derivatives of the trajectory on every single dimension. v im , a im are the ith velocity and acceleration in each independent dimension. f d is the penalty to ensure dynamic feasibility involving f v and f a which penalize infeasible velocity and acceleration l v , l a are the weights of responding dynamic feasibility cost terms. And the velocity and acceleration are penalized along with the exceeding maximum allowable value v max and a max , which can be preset according to the agile of the drone in reality f bs is the boundary cost for smooth motion at the start and end state. It denotes the start of the instantaneous by 0th and 2nd order derivatives are set at the start of the instantaneous state. Q c;i is the ith control point of the cth piecewise polynomial trajectory and _ Q c;i € Q c;i are its first and second-order derivatives. X 0 is the current starting position. And the viewpoint X vpt is taken into account the penalty of the boundary cost. The final associated cost is T is the total trajectory time composed of Dt b and the number of B-spline segments is N þ 1 À p b The above are all the original cost terms of Zhou et al. 3 They are inspired by trajectory planning, which means paying more attention to how to reach the observation point more safely, smoothly, and feasibly. However, they all ignore the environment gain during flight, which is the difference between exploration and planning. Flight-the process of executing trajectory, accounts for more than 90% of the whole exploration work. Therefore, we creatively propose a method considering the in-flight environment in the optimization, which makes the trajectory closer to the unknown area.
The incremental frontier information structure of the framework can get the frontier information representing the area of interest in the exploration space. And we can get a cubic defined by a series of frontier points. We use the center of the cubic X ct to specifically indicate the area can get a lot of environment gains and optimize the control points in the middle to make the trajectory closer to X ct without influencing the original trajectory's start and end points.
So we design the environment penalty cost term as follow where F e is expressed as the square of the distance between the control point Q i and the center of the frontier cube The formula is inspired by the artificial potential field method proposed by Khatib,26 which imagines that the center of the frontier cube generates a gravitational field that attracts the control point Q i .
The formulation of the complete objective function is written as equation (1). And the optimal value of control points Q i is calculated by the gradient descent method with the general nonlinear optimization solver called NLopt. It is worth mentioning that increasing the weight of a parameter in equation (1) can improve the relevant characteristics of the trajectory accordingly, which we will explain in more detail in the "Simulation and analysis" section.
As shown in Figure 5, the control points are attracted to the region closer to the unknown environment after considering the frontier information.
In the real world, UAVs can get more environment information through sensors under the constraint of FOV. Verified by simulation, the fresh strategy can reduce repeated exploration in a certain area, especially the complex corner.
In Figure 6, we built one of the classic and common indoor exploration scenes to show the beneficial effect. In the original work, without the environment penalty term in optimization, UAVs can pass through a series of viewpoints smoothly and safely. However, there will be some small areas being ignored, which makes the UAV have to plan an additional trajectory to explore these areas again for fuller map coverage. In conjunction with the yaw planning strategy in the "Experiment results" section, our work solves this problem and reduces the whole path length and exploration time. The performance details in the simulation will be shown in the "Experiment results" section.

Exploration yaw planning
As the quadrotor dynamics are differentially flat, 22 we can independently plan the yaw without being limited by the 3D position.
The yaw is the eye of the quadrotor to obtain environment information, and its planning should be valued in the exploration work.
According to what we explained above, it's worth noticing that the best viewpoint includes not only position but also the next best view yaw angle information which is used as the end yaw angle. However, there is no special explanation for the yaw angles during trajectory execution. And Zhou et al. 3 roughly plan the exploration yaw with the three steps: (1) sample several points in the trajectory; (2) obtain the yaw parallel to the direction of velocity by arctangent; (3) optimize in the same way we mentioned in the "Upgraded trajectory optimization deployed in exploration" section and get the smooth yaw angle of the whole trajectory. Figure 7 represents a common situation in the original work 3 : UAV has executed the trajectory and has explored the black frame region; therefore in the following exploration work, the yaw angle, which faces the black frame still, is of little help in obtaining new environment information. Furthermore, due to the deficient strategy, the drone loses the opportunities to explore the red frame region in the process, making extra planning for this unknown region.
To solve this problem, we have proposed a smarter yaw planning method shown in Figure 8, which considers the beneficial effects of yaw in the flight. As we all know, there are two ways to accomplish the yaw angle transformation, that is, rotation around the inferior angle or the superior angle. So in this article, we judge the two rotation way through current environment information.
The unknown area we desire to explore, which is specified as the frontier center the same as the "Upgraded trajectory optimization deployed in exploration" section, will be first checked before yaw planning. And the yaw planner will determine the rotation direction (reflex or inferior) of the yaw based on the relative position of the center of the frontier. The specific process is shown in Figure 9: (1) ensure the direction (left or right) of the frontier cubic center X ct mentioned in the "Upgraded trajectory optimization deployed in  Beneficial from this small but caring strategy, UAV can maximize the function of yaw to explore more unknown  areas and avoid the waste of exploration resources. Then we test our proposed method by simulation in the "Experiment results" section.

Simulation and analysis
We verify our proposed method in simulation and compare it with the original framework FUEL. 3 For the reliability of the comparison, we tested three large-scale maps, which are similar to the indoor office scenario and have complex structures that make exploration challenging. To bring the simulation closer to the real situation, we set the starting point of all tests in the center of the map. In addition, we performed all simulations and real-world tests ("Real-world tests" section) on an Intel Core i7-8550U CPU in real-time.
The details of all simulation tests are shown in Table 1. The parameters used by the compared method FUEL 3 are the same as those in Table 1 (without l e ) to ensure the fairness of the comparison.
Different parameters influence the characteristics of the trajectory. The particular feature of the trajectory can be changed by the parameters according to the requirements of the task. v max , a max , and z max are the dynamic limits of the quadrotor, and the trajectory will be limited to this range. We set specific values of the parameters, which are proper to fully verify the improvement effect of our method. l s , l c , and l d influence the smoothness, safety(distance to obstacles), and dynamic feasibly. l bs and w t are the set of parameters that need to be balanced, which affects the smoothness and rapidity of the trajectory between two viewpoints. And in our method, l e determines how close the whole trajectory is to the environment frontier information. The above parameters are all weight values, which means that if the value of a parameter is increased, the weight of the corresponding optimization item will be increased. And the parameters have been optimized through extensive simulation tests to meet the need for autonomous exploration of different clutter environments. In "Real-world tests" section, further real-world experiments have shown that these parameters also perform well in real exploration.
And the FOVs of the sensors are set as [80 Â 60] deg with a maximum range of 4.5 m.
To avoid the chance of results, each method is tested five times with the same initial parameter in every map. Performance and statistics of the two methods are shown in Figure 10 and Table 2.
The simulations verify our above conjecture about the redundancy of the trajectories due to repeated planning.
The original work FUEL, 3 as shown in Figure 11, will miss part of the environment information when passing through complex environments such as corners, resulting in duplicate planning. But Ours, which has added the environment cost term, makes drones approach unexplored areas more intelligently, making local exploration more thorough. It avoids the incomplete exploration of some small and residual parts of the area, which causes the UAV to turn back and repeat exploration work after the exploration of the other areas is completed.
As shown in Figure 12, FUEL 3 is always heading to the wall during trajectory execution resulting in a waste of exploration resources. However, our work notices the frontier information behind the UAV, especially in open areas, and during the flight, by planning the yaw separately, the UAV can discover more unknown areas at the same exploration time.
The above simulation experiments confirm the validity of our work which can improve the efficiency of exploration.

Real-world tests
To further validate the applicability of our method, we conduct extensive real-world experiments in a typical field application scenario as shown in Figure 13. The need for drones to efficiently explore through 16 Â 16 Â 2 m 3 dense woods imposes stringent planning requirements.   In real-world experiments, we localize the quadrotor by a visual-inertial state estimator VINS-Fusion 27 and adopt a classic controller 22 for quadrotors to track the trajectories. The experiments were completed by our DIY quadrotor platform in Figure 14. The hardware configuration of the platform is shown in Table 3.
To verify the stability of the proposed method, we have conducted several experiments in the scenario. And   v max ¼ 1:0 m=s, a max ¼ 0:5 m=s 2 , and z max ¼ 1:1 rad=s are set to limit the quadrotor's dynamic. Similar to the simulation tests, the UAV quickly acquires information about the surrounding environment in open areas through reasonable yaw planning as shown in Figure 15. Besides, as shown in Figure 16, thanks to the proposed environment  cost term, the UAV doesn't head to other areas to continue exploring missions until thoroughly explores the corner areas, avoiding lengthy flights due to duplicate planning for small areas that have not been fully explored. All in all, in real-world experiments shown in Figure 17, the upgraded trajectory planning method we proposed reduces trajectory redundancy caused by repeated planning and improves planning efficiency.

Conclusion
This article proposes an upgraded trajectory planning method for autonomous exploration work. Existing methods ignore the trajectory execution process but only focus on the viewpoint environment gain, resulting in repeated inefficient planning and redundant trajectory.
To address the problem, we present (1) a fresh environment cost term in trajectory optimization which makes trajectory closer to frontiers and gains more environment information in flight, (2) a more flexible yaw planning strategy which gets a set of yaws heading to the unknown area and reduces waste of exploration resources, and (3) test our proposed method in different simulation maps and further verified it in real-world experiments.
Based on the creative improvements, UAVs can explore more thoroughly in complex areas and faster in spacious spaces. And both simulation and real-world tests show that our method can effectively alleviate redundancy of trajectory caused by repeated planning and speed up the exploration process, thus reducing exploration time and flight distance.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by National Natural Science Foundation of