Online paths planning method for unmanned surface vehicles based on rapidly exploring random tree and a cooperative potential field

The unstructured, dynamic marine environmental information and the cooperative obstacle avoidance problem greatly challenge the online path planner for unmanned surface vehicles. Efficiency and optimization are crucial for online path planning schemes. Thus, we proposed an algorithmic combination of the optimal rapidly exploring random tree and artificial potential field methods. First, we built a repulsive potential field by considering the relative velocity and position of the unmanned surface vehicle to obstacles and the international regulations for preventing collisions at sea, wherein we designed a repulsive force calculation method using radar readings to avoid irregular obstacles. Then, we guided the sampling process of rapidly exploring random tree using the potential field to accelerate the convergence rate of rapidly exploring random tree to low-cost obstacle avoidance paths. Finally, we planned for multiple paths based on the leader–follower architecture with the guidance of a cooperative potential field. In the experiments, the proposed method consistently outperformed the benchmark methods. We also verified the effectiveness of the algorithmic modifications by conducting ablation experiments.


Introduction
Online paths planning (OPP) is necessary for cooperative tasks of autonomic unmanned surface vehicles (USVs), such as cooperative search and rescue in the open sea with partially unknown locations and shapes of static obstacles (e.g. islands) and dynamic obstacles (DOs; e.g. ships) that unpredictably appear, disappear, or move. 1 The manipulators of the planned paths are assumed to be small, lightweight USVs with high maneuverability and relatively low speed. Thus, USVs can follow the path to avoid complex obstacles. The simplified state vector of USVs comprises surge, sway, and yaw variables. The planning space is projected from the state space.
Efficiency is the first requirement of the OPP method for the quick response to real-time sensed data to capture cooperative paths. However, the path planning problem in dynamic environments is computationally complex. Meanwhile, cooperative constraints, such as the position coupling between USVs in the obstacle avoidance (OA) process, [2][3][4] greatly challenge the online path planner. If paths are planned individually, the computational complexity is probably unacceptable. Moreover, the high path quality is also a crucial performance indicator for the online planner. Besides the geometric features of paths, the planner should also consider the influence of time-varying sea current. [5][6][7] The studies of literature [7][8][9] suggested that the energy-efficient path should contain various downstream sections. In summary, the OPP technology for USVs is far from mature. This observation is the motivation of this study.
The optimal OA control for a single USV is complicated because of the coupled differential equations with boundary conditions and motion constraints. [10][11][12][13][14][15] Thus, the problem is decoupled into subproblems, 16 such as path planning and optimal control. The model predictive control (MPC) frame is a typical decoupling method that requires the online path planner to provide committed local paths iteratively in short periods. On the basis of MPC, we focus on the path planning problem, leaving the complicated optimal control problem to the path executive module. We initially plan a local path reaching a subgoal that is selected within the sensing domain of USVs. Another local path planning iteration starts when USVs execute the available path. The iterations roll forward until the USVs achieve the goal.
The intelligent searching and reasoning methods are often too computationally complex to apply online. 17 The prevalent reinforcement learning method 18 has difficulty in solving the multiple paths planning problems. The artificial potential field (APF) method quickly queries paths in the gradient descent direction of the potential field that attracts the USV to the target and repulses the USV away from obstacles.
To address the problem of the goal nonreachable with obstacles nearby, Ge and Cui improved APF by defining a repulsive potential field taking positions of the USV and the goal into account. 19,20 Repulsive force was divided into components in the direction from the obstacle to the USV and in the vertical direction of the relative velocity of the USV with respect to the obstacle in the study. 20 The studies of literature 21,22 promoted the potential field definition by considering more OA parameters.
Ge et al. employed an instant-goal-driven method to compute the repulsive force using radar readings. 23 However, the method did not consider the situation when radar readings are from multiple obstacles. Solari et al. calculated the point potential at a certain distance from the obstacle on each beam of the mechanical scanning sonar and selected the motion direction of the vehicle by the lowest potential. 24 Lyu and Yin et al. improved the repulsive field definition by considering OA angle and velocity. 25,26 Wang et al. proposed a ring-shaped repulsive potential field for USV swarm control. 15 Wang et al. developed APF by combining it with the ship domain model to consider the DO's velocity and course. 27 To bypass obstacles smoothly, Qin et al. added a velocity damping coefficient to the repulsive potential function. 28 Balch and Hybinette employed virtual forces toward the attachment sites of USVs to keep the formation 29 and investigated the social potential for scalable vehicle formation. 30 Ge and Fua investigated queues and artificial potential trenches for multirobot formation. 31 Sun et al. studied the collision avoidance problem for UAVs using APF. 32 However, APF depends on the elegant mathematical analysis based on known, static environmental information, which is unpractical. Uncertainties in the environmental information and the USV motion model make the potentials incomparable, even causing the local optimum.
Rapidly exploring random tree (RRT)-based methods were proposed to plan/replan feasible paths quickly with differential, nonholonomic, and inequality constraints. [33][34][35] The optimal rapidly exploring random tree (RRT*) asymptotically converges to the optimal path by incrementally rewiring the path tree. After new states are added to the path tree, they are also considered as replacement parents for existing nearby states in the path tree. RRT* can conveniently solve the local optimum. [33][34][35] However, the efficiency of RRT* is possibly low. 36,37 When an initial solution path is found, the redundant points of the path are pruned by the RRT*-Smart 36 method, and the remaining states as used as biases for further sampling. However, the method may reduce the probability of finding a different homotopy class of the initial solution and further violate the RRT* assumption of uniform density. 37 Gammell et al. 37 presented the informed RRT* method to improve the efficiency of RRT* by focusing the path planning in terms of directly sampling a prolate hyperspheroid subset that is defined as x ellipse ¼ Lx ball þ x center , where x center is the center of the hyper-ellipsoid in terms of its two f local points that are often set as the start and goal points, x ball is the unit ball, L ¼ diagfc best /2, 0.5*(c best 2 -c min 2 ) 1/2 , . . . , 0.5*(c best 2 -c min 2 ) 1/2 g, c best is the cost of the current best path, c min is the distance between the start and goal points, and the dimension of the diagonal matrix L equals that of the planning space.
Otte and Emilio 38 proposed the RRT X method for efficiently avoiding DOs. Rewiring operations cascade down the affected branches using existing nodes to remodel the existing search graph and repair the shortest path-to-goal subtree. Replanning efficiency is improved by maintaining graph connectivity information in local neighbor sets stored at each node with the expected number of O(log n), where n is the number of nodes, as well as aborting rewiring cascades once the graph becomes consistent. Moreover, graph consistency ensures that the newly added nodes and edges are useful for improving the solution.
The study by Wang et al. 39 improved the probabilistic sampling with the 2D Gaussian mixture model to sample around obstacles for short OA paths. The study in Jaillet et al. 40 used the cached historical paths as heuristics to improve replanning efficiency. Transition-based RRT promoted sampling in terms of the environment configurationspace cost maps. 41 The above methods only consider the DO to be errant. Wen et al. 42 assumed that USV can negotiate with DOs and then sampling is biased by the International Regulations for Preventing Collisions at Sea (COLREGs).
As one of the state-of-the-art OPP solutions, the potential field was combined with RRT*. [41][42][43][44] Qureshi et al. directionalized random samples of RRT* through potential functions for reducing the number of planning iterations. [43][44][45] Wang et al. solved the motion planning problem by the bidirectional potential guided RRT* method. 46 In this article, the RRT* efficiency is improved by the following schemes: the local potential field guides the planning toward reasonable areas in terms of the real-time feedback of sensors; the potential field is improved by considering the velocities of USV and DO; and the given environmental information is leveraged by selecting subgoals referring to the offline planned global path. First, the virtual leader's local path is planned to obtain multiple cooperative USV paths. Then, the expected locations of USVs referring to the leader are defined, and multiple paths are planned by the distributed local planners in each USV. The cost distance for the RRT* rewiring is defined by considering sea current, path length, and path turns to reduce the energy expenditure of paths. Therefore, the main contributions of this study are twofold. (1) We improve the efficiency of RRT* to accelerate the convergence to energy-efficient and cooperative OA paths by the guidance of a newly built potential field in dynamic environments, wherein the repulsive potential field considers the OA-related velocities and rules. It is calculated from the radar readings. (2) The online multipath planning architecture is constructed based on the MPC and virtual leader frameworks.

Sketch of the paths planning system for USVs
The hierarchical architecture is divided into submodules, 8,9,27,28 as shown in Figure 1. On the global path planning layer, the potential field used for capturing the global reference path is built in terms of the given environmental information, and the parameters are set empirically. During the online planning procedure, the parameters remain unchanged. Then, the OPP module plans for feasible, low-cost paths for the virtual leader, wherein samples of RRT* are directionalized by the potential field. On the cooperative path planning layer, the cooperation attachment sites of the USV formation are defined by following the virtual leader, and the cooperation potential field is then constructed. Finally, the OA paths are planned locally for USVs.

Objective of the OPP
Three important properties of the paths are considered during the OPP process, i.e. low collision probability, feasibility, and the likelihood of success. 23,12 The likelihood of success requires paths to be energy efficient with few aggressive turns. The planning objective expression is listed in the following where F cons ¼ 0 means the feasible path needs to satisfy motion constraints, M is the number of USVs, N is the number of the discrete waypoints, e i , k denotes the energy cost against the sea current on a path section t i , k between adjacent waypoints: p i , k and p iþ1,k , of the kth USV, v i , k denotes the control cost on t i , k , and length(Á) represents the length of t i , k .

USV motion model
Usually, a physical USV model is simplified to establish a control model that satisfies system requirements, and the characteristics of the USV structure are moderately simplified.
We truncated the freedom degrees of USVs to surge, sway, and yaw for the OPP burden reduction. That is also because the influence of heave, pitch, and rolling motions is supposed to be relatively small on the OPP procedure in ordinary sea states. Figure 2 shows the Earth-fixed inertial frame fig and the body-fixed frame fbg. The positive X-axis of fbg is in the USV heading direction, and the origin locates at the barycenter of USV. The vehicle is well designed to be symmetric so that the barycenter coincides with the center of the hull. The experimental dynamic model is listed below according to the study in Mousazadeh and Kiapey 47  where [_ x,_ y, _ q] T and [u, v, r] T are the velocity vectors in fig  and fbg, respectively, and d u , d v , d r , d ui , d vi , and d ri (i ¼ 2, 3) are the hydrodynamic damping weights. The m ii (i ¼ 1, 2, 3) denotes the mass parameters, and the mass matrix is diagonal. t u and t r denote the surge force and yaw moment, respectively. The parameters in Table 1 refer to a real USV, and they come from the study in Mousazadeh and Kiapey. 47 1d r , L, and W are the length and width of the hull, respectively.
To guarantee the dynamic feasibility of the resulting path, we take motion constraints into account during the OPP process. The inequality constraints of the maximum velocity and acceleration are defined by the performance limitations of USV according to formula (2). The minimum turning radius is set as r min ¼ 4(u 2 þv 2 ) / o max , where o max is the maximum angular acceleration of USV, thus the extendable domain for a path tree node is fan-shaped. The nonholonomic differential of USV motion constraint is sinqÁdx À cosqÁdy ¼ 0, where dx and dy are the X and Y increments in fig, respectively. The constraints are checked during each path extension iteration to ensure the feasibility of the planned path sections.

Collision detection
When a DO enters the working domain of USV, the collision cone scheme was improved to assess the collision situation by the intention of DO, 10 and the collision navigational angle ranges were computed according to the relative velocity of the USV with respect to DO Figure 3 illustrates that v io is the relative velocity of the USV with respect to the DO, r o is the safe radius from the USV to the DO, p i ¼(x i , y i ) denotes the position of the USV, p o ¼(x o , y o ) denotes the position of the DO, v q is the component of v io on the direction perpendicular to the line p i p o , C is the intersection of the expected USV path on the safe circle centering at the DO if the USV and the DO maintain their velocities, and v r is the component of v io on the direction of p i p o . If v q ¼0, then the relative trajectory of the USV to the obstacle is right on the line of p i p o . If v r >0, then the projection of v io on p i p o is positive, and the USV has a tendency of becoming close to the obstacle; otherwise, the USV is more likely to separate from the DO. Figure 3 shows that collision probably occurs at point C. If v r >0 and the direction of v io is within the domain from p i A to p i B that are the tangents of the circular bounding box of the DO from the USV location, then collision possibly occurs.
(v q ) pA and (v q ) pB are the components of v io in the direction perpendicular to lines p i A and p i B, respectively. Thus, the following collision detection condition can be obtained.
If one of the conditions, (1) v q ¼0 and v r >0; (2) (v q ) pA Á(v q ) pB 0 and v r >0, is met, then USV is probably collision with the obstacle. The proof of the second condition refers to in appendix 1. The practical style of the second condition is as: , where d io is the distance between USV and DO. The collision detection algorithm is as:

Cooperative potential functions design for avoiding obstacles
The quadratically ring-shaped attractive potential field is constructed around the goal where d it is the distance from USV to the target. The attractive force is calculated by F att ðqÞ ¼ xd it n it , where n it ¼ À!r(q i , q target ) is the direction unit vector from USV to the target, and we set x to be 1. We defined a ring-shaped repulsive potential field around each obstacle at the region: r o <d io <R o , where R o is the radius to conduct the OA action. We set R o to be 5 kilometers in open water areas and 500 m in narrow passage areas empirically. The repulsive force from obstacles is calculated as follows, and n is the number of obstacles Table 1. Parameters of the USV dynamic model. Figure 2. Schematic of USV planning frame system. USV: unmanned surface vehicle. Because thus, we can get that @v io /@v¼n io and @v io /@p ¼ (v io n io À (v i Àv o ))/d io according to the study in Ge and Cui. 20 Since where n io is the unit direction vector from USV to the obstacle and n io? is the unit vector perpendicular to n io .
The time to perform the OA action is also important for both the refinement and the safety of the path. 28 We set the safe distance as where a m is the maximum deceleration, thus USV has enough space for OA. We set the repulsive force to be infinity if d io r o for emergency OA. The repulsive force is calculated by formulas (6) and (7), and Vectors for the construction of the repulsive potential field are shown in Figure 4. Thus, the repulsive force component along n io pushes the USV away from obstacles, and the component along n io? provides steering torque for  USV to avoid obstacles. We add a positive or negative weight on n io? to change the direction of the steering vector according to the COLREGs.
If the relationship between DO and USVs is cooperative, then USV chooses the OA direction according to the COL-REGs and the intention of DO action that is assumed to be given. Such a relationship exists between USV members in a formation or the communicated USVs and DOs. Then, we add the positive or negative sign to n io? to determine the direction of the repulsive force in terms of the absolute angle (q s , 0<|q s |<p) that is from the motion direction of USV to that of DO. If |q s | ¼ 0 or p, then we adjusted v o by a small angle clockwise. Thus, according to the COL-REGs, the direction of n io? is the steering direction of USV that enlarges |q s |. Figure 5 shows four typical OA situations of USV. Then, the OA direction and the repulsive force are determined. If the relationship between USVs and DO is not cooperative, then no rules should be complied with, and USV avoids DO by a low-cost path.

OPP algorithm for USVs
The potential field of the global environment is constructed according to the given environmental information beforehand. This procedure is regarded as environment preprocessing for online burden reduction. The local potential field should be rebuilt during online planning if the environmental information changes or a DO occurs. The online built local potential field may have the local optimal problem. Thus, the randomness of the RRT* extension is kept to guarantee probabilistic completeness. This feature of RRT* can help the planner escape from the possible local optimum of the local potential field.

Local potential field modeling
The traditional APF method handles the obstacles with irregular shapes by two schemes, i.e. the circular bounding box method and the decomposition method. The bounding box method considers the obstacles by the circular bounding boxes, and then, the distance between USV and obstacle is easy to compute. However, if the obstacle is thin and long, the circular bounding box method is rather inaccurate. The decomposition method splits the irregular shape obstacle into unit circulars, but the decomposition process is complicated, and the method intends to only reflect the local features of obstacles. Front-facing scanning radar is a sort of crucial sensor for OA, thus if the local potential field is constructed according to the characteristics of radar data, then the virtual forces can be calculated directly by the radar readings efficiently. The distances and obstacle's occupied area information that is crucial for OPP are easy to be acquired from radar readings. 23 Meanwhile, OPP generally needs the information on obstacles in front of USVs rather than that behind USVs. Thus, the radial modeling method is applicable. The local environment is expressed in vector form as follows: R ¼ [R 1 , R 2 , . . . , R N ] according to the study in Ge et al.. 23 As shown in Figure 6(a), N), where R d is the minimum distance between USV and obstacle on a radar beam and R r is the radius of the sensing range.
Since consecutive radar readings can be categorized as the feedback of the same obstacle. As shown in Figure 6(a), R 0 , R 1 , and R 2 are considered as the readings of Obs1, and R NÀ2 , R NÀ1 , and R N are considered as the readings of Obs2. Then, the relatively closest reading from the same obstacle can be used to calculate the repulsive potential field, for instance, R 1 is used to compute the repulsive potential field from Obs1 because R 1 returns the minimum distance among all the readings from Obs1. For the same reason, R NÀ1 can be used to calculate the repulsive potential field from Obs2. Then, the repulsive force F rf from obstacles is calculated as the resultant force of F f1 and F fNÀ1 .
The direction of the kth reading is denoted by n ¼ [cosq(k) sinq(k)] T , where q(k) ¼ 180-(kÀ1) qs, and the motion direction of USV is defined as the reference direction of zero degree angle, qs is the angle between a pair of adjacent readings, and qs is determined by the angular resolution of the sensor. Figure 6(b) is an expressive potential field vector constructed by the proposed method.

Cooperative potential function-based RRT* method
Leveraging the advantages of the baseline APF and RRT*, a cooperative potential function-based RRT* (CPRRT*) scheme is proposed. The new cooperative potential function is used to guide the exploration and exploitation processes of RRT*, eventually improving efficiency. The RRT* method is used to handle the motion constraints and the possibly unstructured and changing information to plan paths online. The potential functions are used to directionalize samples to promising areas where the optimal path more probabilistically exists, to reduce the number of iterations. The samples adjustment and path tree rewiring of CPRRT* are regarded as the exploitation or refinement process for enhancing the energy efficiency of paths.
The potential function-based sampling process and totally random sampling process are performed alternately by a probabilistic threshold t p to keep the balance between exploitation and exploration. The description of the OPP algorithm is listed below.
The following steps are conducted iteratively until the OPP time expires.
1. A sample q s is spawned randomly according to the uniform probability distribution in the sensing domain of USV, and the nearest node (q n ) to the sample q s is searched on the path tree. 2. A random number rand2[0,1] is generated. If rand is lower than the probability t p , then the potential function-based sampling scheme is performed, else, the random sampling procedure is conducted. The potential function-based sampling scheme is as follows. The sample q s is directionalized by following the cooperative potential field for k (set as 3) times if the adjustment is collision-free. If a collision happens, then the directionalizing process will stop and the last collision-free sample during the adjustment will be kept. 3. The path tree node q n is extended toward q s . If the extension succeeds, then a new node is added to the path tree and the locally topologic path tree rewiring process is conducted. The extension step length is where v U is the estimated velocity of USV at q n and Dt is the control step duration. The expected motion direction of USV coincides with the extension direction. Figure 7 illustrates the sampling adjustment where q s is the original sample, and q s1 , q s2 , and q sk (k¼3) are the directionalized samples.

Algorithm implementation
ALG.1 shows our OPP algorithm. The online architecture is constructed based on the MPC framework. After a subgoal is selected, the local planning iteration starts and runs in the predefined local planning time. The USVs sail by following the currently available path, and the environmental information in the sensing range of the USVs is updated at each path extension iteration. If the local planning time runs out, then a local path is queried, and the next local planning iteration begins. The local planning with the sensing range rolls forward until the USVs achieve the goal.
At the beginning of each local planning iteration, a subgoal for the current local planning is selected, as shown by the second line, to leverage the given global reference path. If the current sensing range of USVs intersects with the global path, then the furthest waypoint to the USVs on the global path in the sensing range is selected as the subgoal. If the sensing range does not intersect with the global path, then the sensing boundary is divided into discrete points, and the discrete point nearest the global path is selected as the subgoal.
In the fifth line, the repulsive potential forces from obstacles are calculated according to the characteristics of the radar data. The eighth line shows the path tree extension of RRT* guided by the cooperative potential field. The tenth line shows the baseline RRT* extension process.
In the eleventh line, a local path is queried after the local path planning iteration expires, then the redundant waypoints are reduced, and the simplified path is smoothed via the Dubins curve. That is because the OPP method may plan a large number of waypoints, thus, the planned path is probably complex and zigzagged. To make the path easier to follow, the path should be simplified to reduce redundant waypoints and turns. We first classify the peaks of turns on the paths. If the turning angle is big, then we attempt to delete the peak by connecting the points before and after the peak. If the point is the peak of consecutive turns, then we also attempt to delete the point. The redundant waypoints reducing process is conducted sequentially until the q n Environmental potential field RRT* path tree q s q s1 q s2 q sk Figure 7. Illustration of the sample adjustment.

ALG.2 RRTStarExtension(q rs , T)
Input: Sample, q rs ; Cost weights matrix, W; Cooperative potential field, P E ; Current vectors, V cur ; Path tree, T fV, Eg; Path tree nodes, V; Path tree branches, E 01: S near NearbyNodes(T, q rs ); 02: S near .sort(); 03: q n ¼NearestNode(T, q rs )) 6 ¼Null 04: IF CollisionFree(E new ¼Extend(q n , q rs )) THEN 05: T Radar-based local environment modeling; 06: Random sample q s spawning and rand2 0,1 generation; 07: IF rand t p THEN 08: CPRRTStarExtension(q s , T, P att , P rep , V cur ); 09: ELSE /*IF rand >t p */ 10: RRTStarExtension(q s , T); 11: Local path querying and simplification, then smoothing; 12: Virtual leader path following and USVs cooperative paths planning; end of the path. In the twelfth line, USVs cooperative paths are planned by following the virtual leader of USVs. ALG.2 shows the extension process of the baseline RRT*. In the first line, the RRT* method finds the near nodes of q rs within a ball of the radius: R n :¼gÁ(logn/n) 1/d , where g>(2(1þ1/d)) 1/d Á(m(X free )/z b ) 1/d by means of the requirement of asymptotical optimality of RRT*, d is the dimension of the planning space (3), n is the number of path tree nodes, and m(X free ) and z B denote the volumes of the obstacle-free space and the unit ball, respectively. In the second line, the path tree nodes in the stack of S near are sorted in ascending order by their distances to q rs . In the third line, the node q n is the nearest path tree node of q rs . E new in the fourth line and q new in the fifth line are the newly added path tree edge and tree node after the path tree extension, respectively.
The path tree locally rewiring (refinement) iteration is shown from the sixth line to the eighth line. In the sixth line, the function pop(Á) returns the top element of the stack S near . In the seventh line, the cost value of a node q j means the sum of all the costs of path sections from the tree root to q j , and the C value is the cost of a path section.
The cost and C values are computed by means of formula (8). If the cost of a node q j decreases when q j takes q new as the father node and q j can extend to q new by a collision-free path, then q j is rewired to q new and takes q new as its new father, and the path section from the original father of q j to q j is deleted.
The cost function is defined by consideration of the path execution difficulty, energy expenditure, and safety, in terms of the path length, and the compliance of the path to the potential field or current vector Cðq i ; q j Þ ¼ w l Á sinða T =2Þ þ w y Á C y þ w d Á distðq i ; q j Þ s:t: C y ¼ cosðp À a y Þ; 0 a y p=4 or 3p=4 a y p p=4 a y 3p=4 (8) where a T is the angle between the vector from q i to q j (marked as t ij ) and the previous path section vector toward q i , and 0 a T p, the consideration of a T is to reduce the numbers and aggressiveness of turns. The angle between t ij and the direction of potential field or current vector is denoted as a y , 0 a y p. The task-specific weighting coefficients are tuned to be w l ¼ w y ¼ 1 and w d ¼ 0.2, related to the energy costs by the actions of moving straight, turning, and conquering the impeding of current. Since the formula is used to compute local cost, and the distance between q i and q j is generally one extension step length that is generally less than 5 m in the study, thus the weights can guarantee the action cost items to be comparable. The function of dist(.) is used to compute the distance between two nodes.
The current can significantly influence the USV motion, thus it should be considered during the OPP procedure. Since the horizontal current flow can be regarded as stationary and uniform, locally, thus we can consider it by the angles (denoted by a y 2[0,p]) between the direction of USV motions and those of the current, to improve the energy efficiency of the path. [5][6][7][8] When a y < p/4, cos(pÀa y ) is negative, and the downstream path helps to save fuel for the sailing. When 3p/4 a y p, the countercurrent path segment may be fuelcostly. Since the ability to resist transverse current interference of USV is weak, thus we consider the interference of the transverse current by sina y , when p/4 a y 3p/4. The OPP process is guided by a y to plan for energy-efficient paths, such that most sections of the path are downstream, and just a small amount of sections are transversecurrent ones.
ALG.3 shows the potential function guided path tree extension process. In the first line, q s is assigned to q rs, 1 as the initial sample. In the third line, d min denotes the minimum distance between the sample q rs, i to obstacles. In the fourth line, r o is the minimum safe distance between USV and obstacles, and k (3) is the fixed times of adjustment. In the fifth line, R o is the reactive radius for OA. In the sixth and eighth lines, the function PVecSyn(Á, q s ) is defined for synthesizing the potential vectors at q s . If the sample is outside the OA reactive range, then the synthesizing process will not consider the repulsive force from obstacles. In the ninth line, Fcal(Á) calculates the resultant force by means of P total . In the eleventh line, the sample is directionalized for a distance of l along the direction of F total , where l is the extension step length of RRT*. In the thirteenth line, the condition means the adjusted sample is within the minimum distance from obstacles, then we extend the path tree toward the last adjusted collision-free sample q rs, iÀ1 . The fifteenth and sixteenth lines mean the adjustment process is conducted ALG.3 CPRRTStarExtension(q s , T, P att , P OA , V cur ) Input: Randomly spawned sample, q s ; Path tree, T; Attractive potential field, P att ; Repulsive potential field, P rep ; 01: for k times, and we extend the path tree toward the adjusted sample q rs, i .

Algorithm analysis
The heuristic sampling process is used to enhance the efficiency of RRT* because the computational complexity of generating samples is far lower than that of the path extending, and high-quality samples can accelerate the converge process of RRT* to optimal paths. 45 The qualitative illustration of the probabilistic completeness of CPRRT* is listed below. The path tree of CPRRT* is a connected tree rooted at the initial state of USV, and the path tree extends toward samples from their nearest tree nodes. The potential field is just used to accelerate the path searching process of RRT* by guiding samples to avoid obstacles and toward a goal probabilistically. [43][44][45][46] Moreover, the samples remain probabilistic randomness to explore the whole space.
CPRRT* converges to high-quality paths, and the formal proof is as follows. A collision-free path t* is said to be optimal in terms of Euclidean distance, if it has weak d-clearance, meaning that the path can be deformed by a collision-free homotopy h(y) to another path t with bigger distances than d from obstacles; t has the same start and end points to t*; and t* has distance no bigger than d from obstacles. 35 The path t* is provided with the set of all feasible paths such that C(t*) ¼ argmin t2feasible fC(t)g, where feasibility means that paths satisfy motion constraints of USV and are collision-free as well as not violating any OA rule. CPRRT* performs exploration and exploitation of configuration spaces to refine paths. The heuristics of CPRRT* guide paths toward the weak d-clearance by directing samples to spaces with a high probability of containing an optimal path. The potential field continues directing samples down the slope of the potential field to the obstacle-free goal region until the sample gets very close to obstacles or the threshold is reached. The algorithm parameters are set to be the same as RRT* except that the rewiring process also considers the current to adjust the path toward the energysaving direction. Therefore, the OPP solution of CPRRT* can converge to a high-quality path that considers the path length, ease of following, and the influence of current on the USV motion, comprehensively.
However, path optimality is hardly achieved online due to the limited time, and efficiency is crucial for an online planner. To balance the complexity against efficiency, a heuristic potential field function guided sampling scheme is devised for leveraging the known environmental information. Therefore, CPRRT* is probably more efficient than the baseline RRT* method.

Potential field for cooperative paths planning
After a path section is planned online, the virtual leader is supposed to follow the path. Thus, the position and velocity of the virtual leader can be calculated in real time. Then, the path planning targets of USVs are estimated.
We define the inner potentials to keep the USVs formation and then the formation keeping potential field and the OA potential field are synthesized. The USVs in the triangle formation are illustrated in Figure 8, where C 0 is an inertial coordinate system, C Fi denotes the formation coordinate system, and the coordinate (x ci , y ci ) of the virtual leader in fC 0 g is the origin of fC Fi g. We define The ring-shaped formation keeping potential is constructed around each USV according to the studies in Ge and Cui 20 and Qin et al. 28 The attractive potential field is constructed in the following formula where p tar denotes the position of the target, v x denotes the velocity of the virtual leader, p i and v i are the actual position and velocity of the USV member, and a p and a v are specified to be 0.7 and 0.3, respectively. As shown in formula (11), the attractive force has two components where F att1 is used to follow the position of the target and F att2 is used to follow the velocity of the virtual leader. The direction of the unit vector n it is from the position of USV to that of the motion target, and the direction of the unit vector n vit is from the direction of the USV velocity to that of the virtual leader USVs will be pushed back by the repulsive forces from other members for OA if they enter the domain of other members, to keep the activities of each USV within their own allowable range. The repulsive potential field from other USV members is defined in the following formula where the parameters of the repulsive potential field for formation keeping are the same as those in formula (7), and n is the number of the USV members. A USV member considers other members as obstacles, and d ij is the distance between members. The steering direction of USV i avoiding USV j is determined by n ij? , and the calculation of the direction of n ij? is similar tothat of n io? in Figure 5 according to the COLREGs. Formula (13) shows the repulsive force from USV j that is consisted of three components, F rij1 n ij , F rij2 n ij? , and F it n it . Similar to formula (7), formula (14) is used to calculate the repulsive force component in the direction n ij from USV j to USV i , formula (15) aims to calculate the repulsive force component in the direction perpendicular to n ij , and formula (16) is utilized to compute the repulsive force component considering the attraction from the motion target ; if d ij R o and collision may happen The OA priorities of USVs are defined before sailing off to avoid the deadlock problem in the cooperative collision avoidance process between USVs. The low-priority members give way to the high-priority USVs. Once the high-priority USV has reached the estimated position, the collision situation becomes clear. Then, the low-priority member plans for the OA path. The CPRRT* method with the goal bias strategy was used as the local planner. Since we considered the problems of OA and path refinement during the virtual leader path planning process, thus we combined the goal bias scheme that chooses the current target as samples within a probability threshold (0.3). 41

Experimental environment setup
Simulations were conducted on a computer with Windows 10 64 bit OS, 16 G RAM, and Intel(R) Core(TM) i7-7820HQ CPU @ 2.90 GHz.
The OPP parameters are presented in Table 2, where d c is the fixed local planning time and Dt is the interval between control steps, and a local path generally contains more than 50 control steps. PR is the radius of the sensing domain, r o is the minimum safe distance between USV and obstacles, and g RRT * is used in a sample's near neighbors searching process of RRT*. The maximum speed of our USV is 5 m/s, the maximum translational acceleration is 1.5 m/s 2 , the maximum lateral acceleration is 10 m/s 2 , and the minimum turning radius is 2 m.

Ablation experiments
Expressive simulation results. We performed the APF method, baseline RRT* method, and heuristic RRT* method to understand the contribution of each component to CPRRT* performance. The units of the following figures are kilometers. Figure 9 shows the planning result of the proposed APF method in the Monte Carlo experiment, where the locations of the obstacles are randomly changed. The speed of USV is set to be 5 m/s without obstacles nearby, and the OA speed is set to be 2.5 m/s. The direction of the velocity coincides with that of the path section. Figure 9 indicates the possible planning results in the Bug Trap environment. The environment consists of many traps formed by irregularly shaped and concave obstacles, and it also has low visibility. The experiment aims to illustrate that the simulated environment is rather complex for APF-based and RRT*-based planners. Figure 9(a) shows that APF planning easily falls into the local optimum. Figure 9(b) indicates that aggressive turns may exist on the path. In addition, our repulsive force calculation in terms of radar readings is practical. Figure 10 illustrates the results of the ablation experiments where the effectiveness of the heuristic modules is illustrated. The heuristics of CPRRT* include the potential field-based sample directionalizing, subgoal selection that refers to the global path, and path tree topological structure refinement by rewiring. Figure 10(a) shows the planning results of CPRRT* with all of the heuristic modules. The red solid curve is the planned path. The green-and blue-dotted curves are the original paths captured by APF and the simplified path which is smoothed via the Dubins curve, respectively. The tree-shaped lines show the path tree branches during the path searching process. The green stars denote subgoals.
The reference path is queried along the direction of the gradient descent of the potential field. Thus, it is a highquality path in terms of static environment information. The CPRRT* path deviates not far from the reference path. It is short with a small number of gentle turns. Most sections of the path conform to the potential vectors. Therefore, the CPRRT* path is probably a low-cost path.
The small number of tree branches illustrates that the path searching process is efficient because of the heuristics. Figure 10(a) describes the potential vectors built according to the given environment information. The vectors can guide the OPP process to collision-free spaces to improve the efficiency of the RRT*-based planner. The path tree exploitation by rewiring helps refine paths.   Figure 10(b) deviates further from the reference path with many turns, including even the aggressive turns in the middle of the subfigure. That observation shows that the path is probably costly. Figure 10(c) shows the planning results of the potential function-based RRT* (PRRT*) method. PRRT* uses the repulsive potential field without considering the USV velocity. Similar to the studies of literature, [43][44][45] the sample adjustment does not consider the attractive force from the subgoals. The path is longer with more turns than the path in Figure 10(a). Aggressive turns also exist, as shown in the middle of the subfigure. The probable reason is that PRRT* does not consider the attractive force from the subgoals, and the repulsive force pushes the path away from obstacles as far as possible. The path tree of PRRT* has no absolutely random sampling process. Thus, the planner may not be exploring space as comprehensively as CPRRT*. Therefore, the path quality may be lower than that of CPRRT* in dynamic environments when the planning time is quite limited. Figure 10(d) expresses the planning result without the subgoal selection. Moreover, the planner and the attractive potential field consider only the global goal. As shown by the broadly growing tree branches, the planner attempts to search for the path by widely exploring the space, possibly causing the path to be more costly than the path of CPRRT*. However, the planner can quickly find a path with the guidance of the potential field, as shown by the small number of branches. Figure 10(e) describes a path of the cooperative potential function-based rapidly exploring random tree (CPRRT) method. Unlike CPRRT*, CPRRT does not refine paths by rewiring. CPRRT also does not use the guidance of subgoals.
The path of CPRRT in Figure 10(e) is probably shorter but more costly than that in Figure 10(d) because the former has more sections with larger angles with the potential vectors than the latter has. Moreover, the path has a large number of turns in the middle of the subfigure. This scenario is possible because the CPRRT has no path rewiring mechanism.
Figure 10(f) shows the result of CPRRT* when the reference path is blocked because the environmental information changed. The result demonstrates the OPP ability of CPRRT*. The online planned path can avoid the pop-up obstacle. The local potential vectors also change near the pop-up obstacle. These observations express that the proposed local potential field modeling method is practicable. Figure 11 shows the results of CPRRT* for different local path planning iterations. The planner deleted the subgoal selection module to demonstrate the asymptotic optimality and probabilistic completeness of CPRRT*. Thus, it did not use the guidance of the reference path. The results can illustrate the relationship between planning time and resulting path quality. Figure 11(a) shows that, although the planner can find a path within 2000 local planning iterations, the path is not a low-cost one. Figure 11(b) expresses that the resulting path is probably better than the path in Figure 11(a) because the planning iterations increase. The reason is that CPRRT* continues refining paths in the newly added time. Figure 11(c) shows that the path becomes better than the path in Figure 11(b), along with an increase in the iteration times. This observation demonstrates that the CPRRT* method has the asymptotically optimal nature derived from RRT*. Figure 11(a) to (c) illustrates that CPRRT* can plan different path styles. As the iterations increase, the path tree explores the space extensively. Then, many kinds of paths are considered by the planner, and the resulting path is refined adoptively. The asymptotic optimality and probabilistic completeness are kept because randomness remains in the planning process of CPRRT*. Figure 12 illustrates the proposed cooperative paths planning process. Figure 12(a) supposes that the virtual leader moves along the online planned path, as shown by the elliptic bounding boxes. The virtual leader is supposed to be located at the boundary of the sensing domain of the USVs on the path at the beginning of each local path planning iteration. The direction of the virtual leader velocity is estimated in terms of the geometric property of the path. We assume that the communications between USVs are available.
The planners are guided by the cooperative potential field, the defined collision avoidance priority, and the real-time USV statuses to deal with the cooperative OA problem. The dotted lines in Figure 12(b) denote the USV paths. USVs maintain the triangle formation when they are far from the obstacles. They cooperatively transform into  the linear formation to avoid collision when they become close to obstacles, as shown in the middle and the right top of the subfigure. Then, they restore the triangle formation.
The planning objective is to minimize the sum cost of all of the paths. However, the safety of the OA path is also crucial. Thus, the planners may balance safety against cost. The magenta path in Figure 12(c) describes that a long path is planned for the safety of the cooperative OA. The magenta path in Figure 12(d) also expresses that USVs avoid obstacles individually, and they can break the formation for cooperative OA. This scenario illustrates the flexibility of the distributed OPP framework. Figure 13 shows the solutions of our planner in Monte Carlo experiments, which are employed to evaluate the efficiency and the online planning capability of the planner in dynamic environments. The cluttered, irregularly shaped obstacles with many narrow passages challenge the planner.
The online planning ability of our planner is testified for the following reasons: the planner can find OA paths according to the real-time environmental information, and the distributed planner in each USV can find a relatively low-cost path by flexibly changing the formation. The high efficiency is shown by the relatively low number of extension path-tree branches and high path quality that is certified by the few and gentle turns on paths.
A time-varying sea current is generally set to be clockwise in the northern hemisphere in Figure 14. The current simulated below refers to our study in Wen et al 42 Attempts are made to plan paths in the open sea. Thus, the primary interference of the USV navigation is the timevarying sea current. However, the real-time sea current can be measured using the current meter. The online planner should then return low-cost OA paths according to the realtime sensed sea current information. The downstream sections of a path help save energy, whereas the countercurrent or transverse-current sections add extra cost for conquering the current.
Given that the space from the planning start point and end point is set to be without obstacles, the potential field vectors are replaced using the sea current vectors in formula (8). Then, the planning is guided by the current. In practical occasions, the potential field is considered in sample adjustment if obstacles exist because the OA task is prior to the current handling task. The current is considered instead of the potential field in sample adjustment when no OA task is to be conducted. However, the CPRRT* planner always refines paths by path tree rewiring to consider the sea current. In both subfigures, the paths consist of many downstream sections and few countercurrents or transversecurrent sections, possibly demonstrating the online path refinement ability of our planner. That is because the quality of paths is ensured by asymptotical path refinement.
In Figure 15, the dynamic ship-avoidance problem is considered. The path extension process is guided by the potential field vectors whose directions are determined by the collision avoidance rule, namely, COLREGs. Thus, the planning is guided by the potential field to avoid the dynamic ship. Experiments are conducted in four scenarios, as shown in Figure 15. Figure 15(a) shows the USV meets the ship. Figure 15(b) shows the USV overtakes the ship. Figure 15(c) and (d) shows the dynamic ships crossing the USV from the left and right, respectively. The polygonal red curves are the collisionavoidance paths. The red pentagons illustrate USVs, whereas the green ones illustrate ships.
The USV and ship are assumed to obey COLREGs. Then, the potential field vector directions are adjusted in terms of the encounter scenario, as indicated in Equation (7) and Figure 5, to guide the path extension directions of  STDEVP: standard deviation; CPRRT*: cooperative potential function-based rapidly exploring random tree; PRRT*: potential function-based rapidly exploring random tree; APF: artificial potential field; WO: without; RRT*: rapidly exploring random tree.
our planner. Thus, the dynamic ship-avoiding paths that conform to the potential field vectors can be planned.
Quantitative results. We conducted 100 Monte Carlo experiments on different path planning schemes to compare the proposed CPRRT* method with traditional methods and understand the contribution of each heuristic component to the CPRRT* performance. Figure 13 shows the dynamic environment for this experiment.
The statistical results listed in Tables 3 to 5 are consistent with the results shown in Figures 10 to 13. We captured the time of searching for each subpath once the path tree reaches the subgoal or the local planning time expires. The path searching time indicator means the sum of all subpath searching times.
The path cost and path length were obtained after the whole planning expired. If no path was returned in the given planning time, then the planning was regarded as a failure. The standard deviation (STDEVP) reflects the stability of a method. Table 3 lists the quantitative results of planning for a virtual leader's path. The path searching time of APF is lower than 1 s, demonstrating the applicability of the radial repulsive force computation scheme. However, the planning process can be easily trapped into the local optimum. Thus, it cannot be used individually in dynamic environments.
However, the APF path is optimal in terms of the potential vectors. Thus, the smoothed APF path is utilized as the reference path for selecting subgoals. The values of path cost and path length of the APF path are regarded as the best when considering obstacles. The results of PRRT*, APF, and RRT* are considered the ground truth against which we benchmark the CPRRT* and CPRRT methods.
Although the path searching time of CPRRT* is more than 5 s, the whole planning process contains more than five local planning procedures. A local path generally contains more than 50 control steps. Thus, the waypoint update time is probably lower than 0.1 s, which meets the control requirement, as shown by Dt in Table 2.
The path searching time of CPRRT* is slightly lower than that of PRRT*, possibly because the potential field of PRRT* does not consider the attractive potential field from the subgoals. The time STDEVP of CPRRT* is higher than that of PRRT*, mainly because CPRRT* attempts to find the OA path according to the real-time velocities, causing the large fluctuation of path searching time. However, this scenario possibly results in the superior performance indexes of CPRRT*, compared with those of PRRT*. These performance indexes include path cost, path length, path cost STDEVP, and path length STDEVP. The path length and path length STDEVP of CPRRT* are close to those of PRRT*. However, the path cost and path cost STDEVP of CPRRT* are superior to those of PRRT*. The reason is that PRRT* is guided by the pure repulsive potential field without considering the USV velocity, probably causing aggressive turns on the OA path of PRRT*, as shown in Figure 10(c).
The performance indexes of CPRRT are worse than those of CPRRT* and PRRT*. These performance indexes include path searching time, path cost, and path cost STDEVP. The main reason is that CPRRT is inclined to search for a path by extensively exploring the space rather than refining the path tree. This finding illustrates the contribution of the rewiring module of RRT*.
The abbreviation "WO" means "without". The CPRRT* without the guidance of the potential field obtains longer path searching time, higher path cost, longer path length, larger failure rate, and larger STDEVP values than those obtained by CPRRT* with potential field heuristics. This finding demonstrates that the planning process of CPRRT* without potential field heuristics becomes increasingly random and unstable, verifying the importance of the potential field guidance module.
The method without a subgoal means that the CPRRT* uses no reference path information, and it only extends the STDEVP: standard deviation; CPRRT*: cooperative potential function-based rapidly exploring random tree; PRRT*: potential function-based rapidly exploring random tree. path tree toward the global goal. The performance indexes of CPRRT* without using the reference path are much inferior to those of the method with a reference path. The reference path is planned using the given environmental information, thereby guiding the planner to extend the path in terms of historical experiences. The method without a reference path must explore space extensively to capture a path, causing long path searching time, long path, and high path cost. Its failure rate is also high because traps exist in our environment, impeding the path planning process. However, the performance indexes of CPRRT* without potential field guidance or without a reference path are still much better than those of RRT*. RRT* explores the space randomly without assessing the probability of containing a path in a subspace, causing the exploration easy to trap in the concave obstacles. Table 4 presents the results when CPRRT* is required to iterate for given times. The environment is shown in Figure 11. The increment of the path planning time is not linear to that of the iteration times. That is probably because the algorithm attempts to explore space extensively as the local iteration times increases, causing the path tree to be easily trapped in concave obstacles, as shown by the intensive tree branches in traps in Figure 11.
However, the increase of iterations also results in a higher path quality compared with the results of APF. As presented in Table 3, the APF results are regarded as the ground truth value of the optimal path. The path quality is rather high as the local iteration times are 10000. The observation probably demonstrates the asymptotic optimality of the CPRRT* method. Table 5 presents the results of path planning for USVs in a formation where the environmental information is changing, as shown in Figure 13. We gain the local path searching time after a local path reaching the subgoal is found. Both the CPRRT* and PRRT* methods can meet the time requirement of control, as shown by the local path searching time and time STDEVP values. The path cost and path length values are higher than those of the virtual leader's single path in Table 3 because USVs are required to consider the collision-avoidance problem between each other. The STDEVP values are relatively small, certifying the stabilities of the methods. The paths are high-quality, as shown by the results.
The results illustrate that the CPRRT* method outperforms PRRT* in our Bug Trap environment. CPRRT* is applicable because it performs well for planning multiple paths of USVs.

Conclusion
In this article, the proposed CPRRT* method shows promising results in solving the online cooperative paths planning problem for USVs in dynamic environments. Simulations and ablation experiments were performed to illustrate the effectiveness of the heuristic modules. A potential field was proposed by additionally considering the velocities of the USV and the DO, and the COLREGs. The performances of CPRRT* and PRRT*were compared. The experimental results verified that the new potential as well as the sampling adjustment mechanism of CPRRT* outperformed those of PRRT*. The repulsive potential force computation scheme was presented in terms of radar readings, and experiments were conducted to verify its applicability. The USVs' paths were planned by the heuristics of a formation, keeping a potential field to follow the virtual leader. Experiments were performed to certify that the method can flexibly solve the cooperative paths planning problem. Simulations were also performed in dynamic environments to confirm the online planning capability, the sea current handling capability, and the dynamic shipavoidance ability of the CPRRT* planner.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.