Multi-robot informative path planning in unknown environments through continuous region partitioning

We consider the NP-hard problem of multirobot informative path planning in the presence of communication constraints, where the objective is to collect higher amounts of information of an ambient phenomenon. We propose a novel approach that uses continuous region partitioning into Voronoi components to efficiently divide an initially unknown environment among the robots based on newly discovered obstacles enabling improved load balancing between robots. Simulation results show that our proposed approach is successful in reducing the initial imbalance of the robots’ allocated free regions while ensuring close-to-reality spatial modeling within a reasonable amount of time.


Introduction
Multirobot informative path planning (MIPP) 1 is a highly practical problem in robotics. It is encountered in several applications of multirobot systems, including unmanned exploration of unknown environments, robotic search and rescue, collecting samples of ambient phenomena like fire, nuclear radiation, soil moisture, or algal concentration, from difficult and hazardous environments, such as postdisaster scenarios, combat environments, forests, or water bodies, such as oceans and lakes. [1][2][3] The main challenge in MIPP is to coordinate a set of robots so that they can collect samples of the ambient phenomenon and report it back to a central location such as a base station. The robots need to perform this collection task in an energy-efficient manner so that they can collect the maximal information from the environment with the limited battery energy available with them. The problem is further complicated because the locations in the environment where the most information is available are not known a priori and the robots have to explore the environment and discover these locations in real time. Also, the robots have to coordinate their movements and collection actions so that they do not end up collecting redundant samples and are also able to avoid collisions and path conflicts with each other. 3,4 In this research, we consider another practical aspect of MIPP in the form of communication constraints-also not known a priori are the locations in the environment where the robots can form a partial or complete network via wireless connectivity to exchange local knowledge.
However, we observe that in an unknown environment, where the locations, numbers, and shapes of the obstacles are unknown, it might happen that the areas of an initial Voronoi partition are imbalanced. [5][6][7] As the robots navigate the environment and discover obstacles, their perceptions about their allocated regions change. Therefore, we posit that their allocated partitions may benefit from dynamic adjustments as each robot discovers inaccessible subregions (e.g. due to obstacles) within its current partition, over time, rendering the load of the information collection task better balanced across all robots. Robots partition the region based on current perceptions of the environment; so, if perceptions include newer information, then there is an opportunity to refine partitions upon regaining connectivity with neighboring robots. We have verified our proposed approach with different numbers of robots having different communication and light detection and ranging (LIDAR) device ranges performing the task of information collection within environments having different distributions of information samples using the Webots simulator. Our results show that the proposed approach is able to successfully reduce the difference between the optimal and initially allocated area distribution among the robots while incurring a reasonable time for up to eight robots.
A preliminary version of this work was published as a short article at the FLAIRS 2019 conference. 8 The conference paper does not contain any experimental evaluation. This article adds extensive simulation experiments and clarified algorithmic details, accordingly expanding upon the abstract, introduction, and related work sections.
The contributions in this article are as follows: To the best of our knowledge, it presents the first formalization of the MIPP problem under communication constraints in an unknown environment. It proposes a greedy informative path planning strategy combined with a novel load-balancing algorithm to recursively repartition the multiple robots' allocated Voronoi components and achieve a better load-balanced information collection system. It extensively tests our framework with varying numbers of robots as well as varying communication and laser ranges within a high-fidelity robotics simulator, Webots.
This article is organized as follows. First, we discuss the related literature. We next describe the problem under study in this article, including the initial Voronoi partitioning strategy and the greedy informative path planning technique our approach employs. After that, we discuss the recursive Voronoi repartitioning strategy and detail the experimental evaluations and simulation results. Certain limitations of our approach, and their implications in our study, are also identified. Finally, we conclude and suggest potential future directions.

Related work
One of the very first studies on the studied NP-hard MIPP problem is due to Singh et al. 1 They have proposed a branch-and-bound based budgeted path planning technique to solve the problem. This work uses Gaussian processes (GPs) 9 to model environmental spatial phenomena. GP is a popular regression model, which has been used not only in MIPP studies but also for general information collection algorithms using static wireless sensors. 10 The authors have used two metrics to study the information collection quality: entropy and mutual information. In our work, we use entropy to be our quality metric. The MIPP problem with mobile robots having periodic connectivity has been studied by Hollinger and Singh. 11 In this study, the robots have to come within each other's communication ranges periodically. Another periodic connectivity approach for information collection is proposed by Viseras et al., 12 where they have employed a sampling-based path planner. Recently, Dutta et al. 4 proposed a solution for the MIPP problem, where the robots need to maintain a connected network throughout the information collection process. The authors have used bipartite matching to avoid inter-robot collisions while collecting maximal information and u À v node separators to maintain connectivity among the robots. Our work in this study does not require the robots to maintain either continuous or periodic connectivity rather if they come within each other's contact, they intelligently share information and balance their workloads.
Greedy informative path planning strategies are highly popular in this field 2,13 and the proposed strategies also provide provable near-optimal guarantees. However, the solution is centralized and does not scale well with the number of robots. A polylogarithmic approximation bound in a metric space is provided by Lim et al. 14 A greedy information collection strategy has also been used by Dutta and Dasgupta, 15 where a group of robotic modules forms user-defined configurations while collecting maximal information from an environment. Our work in this article uses a similar greedy informative path planning strategy. A sampling-based path planner is used for information collection by Hollinger and Sukhatme, 16 where the authors have proposed a variant of the RRT* algorithm. 17 An MIPP algorithm has recently been proposed by Luo and Sycara, 18 where they used a Gaussian mixture model to better prediction results. Partially observable Markov decision process has also been used for modeling the information collection phenomena. 19 Singh et al. studied a nonmyopic version of the MIPP problem. 20 A similar strategy in a spatiotemporal field is used by Meliou et al. 21 Environment monitoring with a team of underwater vehicles is studied by Kemna et al. 22 and Lermusiaux et al. 23 In contrast to the work proposed by Woosley et al., 24 where all robots explore and select locations from the entire environment, this article partitions the environment a priori to enable robots to divide the exploration task between themselves so that they can select and visit information sampling locations more rapidly. Ma et al. 3 have used information collection techniques for ocean monitoring while employing a bipartite matching approach. However, most of these discussed studies on MIPP do not take the communication constraints of the robots, a highly practical consideration, into account. Moreover, the environment structure, that is, the obstacle locations and shapes are supposed to be known. In this article, we relax these assumptions and the robots collect information from an initially unknown environment while gracefully handling their restrictive communication ranges.
Voronoi partitioning is a scheme of decomposing an area into disjoint cells, where every point in each cell is closer to the center of that cell than any other cell. 7 Voronoi partition has been used in solving several problems in robotics. Choset and Burdick 25 use a generalized Voronoi diagram for a robot path planning problem. Zhou et al. 26 recently proposed an online collision avoidance mechanism using buffered Voronoi cells. Hungerford et al. 27 proposed to decompose the environment into Voronoi cells and each robot is made responsible for covering its corresponding Voronoi cell. Moreover, if part of one robot's allocated cell is inaccessible to that robot, other robots cover the inaccessible part of the cell. A similar area coverage strategy using a group of networked robots has been studied by Breitenmoser et al., 28 where the environment and the obstacles are assumed to be nonconvex. Coverage path planning in an initially unknown environment using a multirobot system has been studied by Guruprasad et al., 29 where the authors have proposed an online Voronoi partition-based coverage algorithm. Multirobot coverage path planning using Voronoi partitioning has been studied by Cortes et al., 30 where each robot communicates its position while dynamically adapting the partitions with nearby robots complete coverage. Recently, a Manhattan distancebased Voronoi partitioning scheme is proposed by Nair and Guruprasad 31 for multirobot area coverage. In a different work, Nair and Guruprasad have combined geodesic and Manhattan distance-based partitioning strategies for coverage in presence of obstacles. 32 Schwager et al. 33 proposed a control mechanism that allows a swarm of robots to position themselves to optimize the measurement of sensory information in the environment. A Voronoi repartitioning strategy based on the unbalanced areas of the initial partitions has been presented by Tewolde et al. 5 We use their proposed algorithm for Voronoi cell repartitioning among the communicating robots. Decentralized area partitioning strategy has been previously used by Acevedo et al. 34 for area patrolling. However, the authors modeled the solution in a way that the robots periodically come into each other's contact for sharing information, which is not required in our approach. Moreover, the article does not handle unknown obstacles in the environment.
Our proposed method aims to bring the abovementioned concepts together: how to use the idea of region partitioning for better load-balancing among the robots for information collection. The closest study to this work is due to Kemna et al., 35 which we complement and extend in the following way: Kemna et al. does not consider any obstacle in the environment, whereas our repartitioning algorithm works based on robots' explored regions and the unknown obstacles that they discover. Moreover, the work by Kemna et al. 35 assumes the environment to be restrictive for communication, whereas we consider a different but realistic constrained-communication model, where the robots have limited communication ranges while the environment itself is not communication-restrictive.

Problem setup
Let R ¼ fr 1 ; . . . ; r n g denote a set of n robots dropped from an aircraft at a given set of locations S ¼ fs 1 ; . . . ; s n g within a convex polygonal environment E & R 2 to be navigated. Environment E may include inaccessible subregions, due to static obstacles, also assumed to be convex polygons, that is, E ¼ E free [ E obst with E free \ E obst empty. All robots are initialized with common knowledge of R, S, and E but with no knowledge about the location nor geometry of the obstacles, initially assuming E ¼ E free . Inaccessible regions are rather discovered during navigation, each robot equipped with a laser rangefinder by which to detect (and circumnavigate, when feasible) nearby obstacles. Robots are also each equipped with localized navigation (e.g. using GPS), an information collecting sensor (e.g. a camera), and wireless radio to share location, path, obstacle, and sensor information with other robots within communication range C. Our problem setup assumes that both the radio range (for inter-robot communication) and the laser range (for obstacle detection) are greater than the sensing range (for information collection). We do not require the distances between initial locations in S to necessarily fall within communication range C, that is, the environment E itself is not communication restrictive.

Initial Voronoi partitioning
The Voronoi partition is a widely used mechanism for separating a space into nonoverlapping components based on the "nearness" concept. Given a convex polygonal region E and n points in the set S ¼ fs 1 ; . . . ; s n g, we can associate a polygonal Voronoi component E i with every point s i 2 S as the following where k q À s i k denotes the Euclidean distance between two points q; s i 2 E. Intuitively, each Voronoi component E i is the collection of those points, which are closer to s i than any other site s j 2 S. The intersection of any two Voronoi components is either null, a line segment, or a single point. Each Voronoi component is a topologically connected non-null set. This standard partitioning of the environment into n Voronoi components for n sites can be done in OðnlognÞ time. 6 Recall that all robots R are initialized with common knowledge of starting locations S and environment E, and none have initial knowledge of the obstacles and thus will initially assume that E ¼ E free . It follows that all robots will begin with a common understanding of the Voronoi partition E 1 ; . . . ; E n associated with their initial airdrop locations S. Each such component E i is then interpreted as a hard constraint on the portion of the environment that robot r i is permitted to navigate within. Also, recall that this initial partitioning makes no assumptions that the distances between locations in S are within communication range C. Of course, whenever robots fall within communication range C, information gathered during navigation (e.g. detected obstacles) can inform subsequent Voronoi repartitioning of the environment (see Figure 1).

Informative path planning
Consider a set of information collection points, or the points of interest (POI), that cover environment E in a grid structure. Each robot moves from one poi 2 POI to the other via straight lines in a deterministic manner and, upon reaching any poi, uses its sensor(s) to collect the information within that poi's grid cell. Each robot's sensing process is assumed to have negligible noise but only within the current grid cell; specifically, the information collected from the current poi will be interpreted as "ground truth" for that poi, but no direct information from (unvisited) neighboring poi's is revealed. Observe that, under these motion and sensing assumptions, at any point of time, the POI set can be decomposed into two disjoint subsets, U and V, corresponding to the unvisited and visited collection points, respectively, with equivalence to the unobserved and observed grid cells in E. Finally, each robot is assigned a budget B, which indicates the maximum number of poi's that can be visited. The objective of the robotic team is to follow maximally informative paths or equivalently minimizing redundancy in information collection with subject to the path-length budget B per robot.
We model the observable environmental phenomena using GPs, 36 which assume that all the collection locations in the environment (POI) generate information according to a Gaussian random vector X with known (prior) mean vector and covariance matrix S. By virtue of the above motion and sensing assumptions, it follows that for robot paths that render a set U of unvisited poi's, the Gaussian random vector X U characterizing the uncollected information has (posterior) mean vector and covariance matrix given by equations 9,10 with X V denoting the set of measurements sensed in the visited poi's and the prior statistics organized into the corresponding block forms with respect to U and V The diagonal elements of the covariance matrix S UUjX V in equation (2) express the posterior variances s 2 ujX V of every unvisited POI u in U. The associated posterior entropies are, in turn, given by and objectively quantify which unvisited locations in the environment, when considered individually, contain the highest amount of uncertainty, 10 that is, contribute most to the mean-square-error when predicting future information measurements via the posterior mean vector U jX V . Observe that a negative penalty value is introduced for unvisited locations rendered inaccessible by discovered obstacles, promoting compatibility between obstacle avoidance capabilities of the robots and any subsequent path planning driven by uncertainty reduction. Our multirobot planning approach to uncertainty reduction will adopt the straightforward "greedy" informative path generation strategy: each robot r i simply employs equation (3) to calculate the best neighboring location to visit (within its Voronoi component E i ) based on the measurements from all previously visited locations V and up-to-date knowledge of discovered obstacles E obst . More formally, given robot r i is currently in location poi curr , it will select informative location poi next according to with N ðpoiÞ denoting the set of all neighboring informative locations to any given poi in POI. The greedy path planning strategy has been proven to yield acceptable results in information models using the assumed spatially localized GPs. 2,35 Given the fact that each robot calculates only the next best poi to visit in every iteration (i.e. the planning horizon is 1) in a distributed manner (i.e. calculates only own path) and poi next is a neighbor of the robot's current location, the abovementioned path planning strategy will scale polynomially. 2,10 Algorithms As the robots do not have any prior knowledge about the locations and the shapes of the obstacles in the unknown environment, it may happen that the initial partitioning of the environment is imbalanced, that is, some robots have more free informative locations in its Voronoi cell and the others have less. As the robots move in the environment and collect information, they may also discover obstacles. When they later meet other robots, they share their updated local knowledge of the environment and may repartition their currently allocated Voronoi cells.

Voronoi repartitioning for load balancing
In this section, we present the load balancing algorithm used in our recursive region partitioning algorithm. This algorithm is an adaptation of the virtual force-based loadbalancing algorithm proposed by Tewolde et al. 5 Assume that during the task of information collection, a set S R of k robots get positioned such that their induced communication graph is connected. In such a situation, the load-balancing algorithm is triggered to balance the workload between these k robots. To achieve this, the algorithm performs Voronoi partitioning on the total area of the current k cells and possibly adjust the geometry of the cells along with the center of these cells. For a robot r i 2 S, its current workload w i is calculated using the following formula The underlying idea is to allocate more cells to a robot whose workload is low, thereby reducing workloads of the overloaded robots.
The algorithm with all its technical details is presented in Algorithm 1. Refer to Figure 2 for an illustration using two robots. Note that in the original algorithm, 5 the number of iterations is not specified. However, in our experiments, we have found 1000 to be a reasonable upper-bound on the number of iterations required to achieve an equilibrium in the load-balancing process; we denote this upper bound by K. Also, in our experiments, we terminate the load-balancing An illustration of the dynamic load balancing is shown using two robots r 1 ; r 2 . The line ' is the common boundary of the initial Voronoi partition of the two robots. After load-balancing, ' 0 is the new common boundary. Observe that after load balancing, r 1 has lesser area to cover, whereas r 2 has now more area to cover. The load balancing happens based on the robots' currently detected obstacles in the environment (green polygons).
process when the standard deviation of the w i reaches a value of t or less. In our implementation, we have set a ¼ 0:04, b ¼ 1, l ¼ 1, Dt ¼ 0:5, K ¼ 1000, and t ¼ 5.

Recursive region partitioning for information collection
Each robot, r i , will continue to generate a new path within its designated Voronoi cell, E i , and will follow that path until one of the following two things happen: 1. The path length covered so far by r i exceeds the assigned budget, B, 2. r i meets (i.e. comes within the communication range of) another robot r j .
While visiting new poi cells within its designated region, r i will continuously broadcast messages containing its unique ID, current location, and its currently allocated region, that is, E i , a polygonal region. If r i meets another robot, then depending on this received information from the other robot(s), they will decide whether they should repartition the union of their current Voronoi cells to achieve load balancing or they should ignore repartitioning and continue to stay within their currently designated cells.
To keep track of contacted robots and their corresponding Voronoi cells along with their discovered obstacles so far, each robot, r i , maintains a data-structure, named perception of world (PoW i ). The PoW i can be imagined as a 2D array of time-stamped objects indexed by poi's and contains information about the following: E i 's, discovered obstacle locations, and the sensed information measurements. It might happen that robot r i discovered obstacles in robot r j 's current Voronoi cell, which r j might or might not have any knowledge about. But when they meet, both of them will have the knowledge about these obstacles.
As the obstacles are assumed to be static, if PoW are to be exchanged between r i and r j , this perception will be shared by them and the corresponding repartitioning strategy will be implemented. Note that PoW i is a local data structure, that is, different robots' PoW i might be different in terms of stored content depending on which robot(s) they came into contact with. This data structure will help the robots to partition their respective regions if required in a more intelligent way as we will see next.
When to repartition? Let R i denote the set of robots that are within robot r i 's communication range. If r i meets with r j 2 R i , one of the following situations can arise: 1. r i has never communicated with r j before. 2. r i has communicated with r j before.
(a) perceptions of r i and/or r j about the other robots' partitions (PoW i and/or PoW j ) have changed after r i last communicated with r j . (b) PoW i and PoW j are still the same from r i and r j 's last communication.
We will handle these cases one by one. We will start off with the simplest case (2b)-r i and r j 's perceptions about the other robots' partitions (PoW i ; PoW j ) have not changed from the last time these two robots have met. In this case, they do not have to update their plans and/or their respective Voronoi cells and they can plan their next best poi's to visit within their current Voronoi cells (line 12 in Algorithm 2).
If r i and r j are meeting each other for the first time (case 1), they might or might not have current partition information about each other. In any case, they will repartition their current Voronoi cells based on their latest knowledge about the obstacles in their respective PoW. As the environment is unknown to the robots in the beginning and they are discovering shapes and locations of the obstacles in the environment as they explore more, the initial Voronoi cells might not be "balanced" in terms of the free space available to explore. Consequently, one robot might be tasked with exploring significantly larger/smaller amount of region in the environment compared to the others. To minimize this potential high level of variance and bring balance to the obstacle-free areas of robots' exploration regions, we use a Voronoi cell repartitioning technique described in Algorithm 1. As a result of this repartitioning strategy, the participating robots are allocated to a unique and new polygonal region (possibly concave) in the environment for information collection. The algorithm virtually shifts the sites for the Voronoi partitions on the plane such that the new cells are balanced in terms of the number of free poi cells in them and it terminates when the standard deviation of the load distribution is sufficiently low enough Algorithm 2. Recursive region partitioning for information collection (see Algorithm 1 for details). Note that the repartitioning algorithm is only executed on the area bounded by the union of the Voronoi cells of the coordinating robots ( R i ). As the robots discover more obstacles in the environment, their perception about the environment will change and this will be reflected in their continuously repartitioned regions.
In case r i and/or r j 2 R i , PoW have changed after they last met (case 2a), they will follow a similar strategy to repartition their current Voronoi cells, as prescribed for case 1. When a robot reaches its path budget limit, that is, covered B informative locations, it will stop exploring the environment any further. But it will continue to broadcast its PoW along with a final message so that the other robots can plan their partitions accordingly, whereas the stopped robot will not repartition its latest Voronoi cell (lines 3 and 4 in Algorithm 2). Proposition 1. The region partitioning approach produces complete coverage of the environment E.
Proof. Coverage of E is complete when every poi 2 POI belongs to some Voronoi cell E i of some robot r i . After the initial partitioning, [ n E i ¼ E and therefore, POI E. When a set of robots R repartition their current cells, they do the repartitioning of the union of their current cell area, that is, As a result of the repartitioning, V does not change 5 and thus, V 0 does not change. Therefore, V 0 [ V ¼ E. Hence proved.

Evaluation
In this section, we discuss the simulation settings and present the results found via rigorous simulation experiments within a 3D robot simulator Webots. 37

Settings
We have tested our proposed informative path planning approach in simulation, which is performed on a desktop computer with a 4.6 GHz. Intel Core i7-8700 processor, 16 GB RAM, and an AMD Radeon Rx 550 4 GB graphics card. We have used a simulated model of the Pioneer 3AT robot within the Webots simulator that is equipped with a GPS and a Sick LMS 291 LIDAR sensor ( Figure 3). Two different 49 Â 49 m 2 environments (shown in Figure 4) are considered, choosing initial robot locations as well as rectangular obstacles at random. We have varied the obstacle amount in the environment among 5%, 10%, and 15%. LIDAR ranges are varied between f5; 10; 15g m. The communication ranges are varied between the same distances. Each robot has the same communication and LIDAR ranges in any particular test setting. The budget is fixed to 50 grid cells in all the tests. Each robot starts off with the ground truth information measurement of 10% of the grid cells for calculating the initial values of the GP hyperparameters. The main metrics that we evaluate in this article are (1) entropy, (2) variance, (3) root mean square error (RMSE), (4) time, (5) load, and (6) amount of obstacle detected. As this is the first MIPP solution with communication-constrained robots that handles unknown environments having polygonal obstacles, we could not compare against any previous approach. In the simulation, each test is run 10 times and the average results are presented here. The error bars in the plots indicate the standard deviations of the y-axes metrics. Table 1 summarizes the different parameters used in our experiments and the different values used for these parameters.

Modeling environmental information.
Our experiments assume a 49 Â 49 m 2 grid, where the corner points comprise the 49 Â 49 poi's underlying a homogeneous, isotropic Gaussian Markov random field using exponential pairwise covariance functions: for any pair of poi's i and j at spatial locations p i and p j , respectively, let where b > 0 is the local standard deviation and ' (in meters) is the exponential rate of diminishing covariance between increasingly distant poi's. Figure 4(a) illustrates a sample from such a process with zero mean and using parameters b ¼ 1 and ' ¼ 25 m to define the covariance matrix. Observe that such a process, when b ¼ 0, will deterministically render only the mean field: Figure 4(b) illustrates such a mean field that is proportional to a fivecomponent Gaussian mixture over the 2-D spatial region, each component located at its length-2 mean vector with spherical contours determined by a 2 Â 2 diagonal covariance matrix. Note that ENV 1 and ENV 2 are comparable to the test environments used by Cao et al. 2 and Hollinger and Sukhatme, 16 respectively.

Results
In this section, we present the results obtained by our proposed approach in different environmental settings. This section is divided into six subsections, each of which discusses the results for one particular performance metric.
Entropy. First, we present the results for our optimization metric, entropy. As shown in equation (4), the objective of the robots is to collect maximal possible information from the environment through maximizing the entropy in every iteration. To decide the next "best" location, each robot finds the neighbor poi that yields the maximum entropy.
In our experiments, the robots use all the visited poi's to predict the information measurements in the unvisited poi's. The results of the entropy metric are shown in Figure 5. Our results are consistent with the findings of Cao et al. 2 and it can be observed that the greedy method cannot exploit the low spatial correlation in the environment very well. Therefore, the entropy plots do not show a consistent trend. The myopic nature of the used planning strategy reinforces it further. Our proposed framework is generic-a more sophisticated informative path planning strategy 2,18,21 can be used in place of this greedy strategy without impacting the proposed novel coordination mechanism.
Root mean square error. Root mean square error (RMSE) is a standard metric to evaluate the quality of a regression model. 9 In our case, lower RMSE indicates that the GP prediction can closely model the underlying spatial phenomena shown in Figure 4. The results are shown in Figures 6 and 7. LIDAR range is fixed to 10 m in these plots. We can observe that varying communication ranges of the robots do not have a significant effect on the RMSE metric. On the other hand, with an increasing number of robots in the environments, RMSE values go down consistently. This result is consistent with the results found by the previous information modeling strategies. 1,2 We only present the results for the laser range of 10 m as we believe it to be the best representative. For L ¼ 10, the maximum average RMSE value found in ENV 1 is 0.26 with two robots and O% ¼ 5; 10 and C ¼ 10. On the other hand, the lowest RMSE value found in ENV 1 is 0.05 with eight robots and O% ¼ 15 and C ¼ 10; 15.  (a) is obtained as a sample from the Gaussian process with the associated covariance matrix but assuming a zero mean field, while the zero covariance matrix underlying ENV 2 degenerates every sample to equal the mean field, crafted as a set of circular rises above zero in (b). 8

International Journal of Advanced Robotic Systems
Variance. Following Krause et al., 10 we are also interested in studying the posterior variance found in the diagonal elements of the covariance matrix (equation (2)). The results for this metric are shown in Figures 8 and 9. We can see that with time, the posterior variance values generally go down. The slope is usually steeper with more number of robots in the environment.   Time. Next, we show the execution time of our proposed approach with different test settings. The result is presented in Figure 10. From this plot, we can observe that with increasing robot count, the run time does not increase significantly. The slight increment is due to the potentially more repartitioning among the robots, while the budget   of each robot remains the same. The average run time with 5% obstacles in ENV 1 and 10-m laser range across all C and n is 136.89 s, whereas with 5% obstacles, it increases slightly to 138.43 s. The maximum time taken by any simulation was 479.39 s with eight robots and O% ¼ 5, C ¼ 5, which we believe to be fairly reasonable given the intractable nature of the handled problem. The execution time of the proposed approach has not varied significantly in ENV 2. The maximum run time for ENV 2 is found to be 456.48 s with eight robots and O% ¼ 5, C ¼ 10.
Load. To demonstrate the effectiveness of the use of the Voronoi repartitioning strategy, we use the following terminologies. Let OPT l denote the optimal average workload, Init l denote the initial average workload of the robots based on the initial Voronoi partitioning without considering the obstacles in the environment and let Fin l denote the final average load of the robots. OPT l is calculated using the following formula To show the effectiveness of the repartitioning approach, we calculate the differences between OPT l , Init l , and Fin l . As the initial partitioning does not take the load into account, it can be highly imbalanced. We are interested in observing how the difference with the optimal has reduced over time. The initial and final differences with the optimal load are calculated as follows The results are shown in Figures 11 to 16. The y-axes in these plots show the differences of Init l and Fin l with the OPT l , that is, Init diff and Fin diff . The maximum difference (D) between Init diff and Fin diff is 180.93% with eight robots and O% ¼ 15; C ¼ 15; L ¼ 15. Generally, we have noticed that with higher communication range, the value D goes higher, that is, the robots are better able to reduce the initial load imbalance. This is highly intuitive because with higher C, they can communicate with farther robots and the repartitioning process can happen between potentially more number of robots, which in turn lowers the load imbalance. Similarly, a longer laser range helps the robots to detect farther obstacles. Along with a higher communication range, this enables the robots to better reduce the D metric. This is also evident from our results. For example, with L ¼ 5 and C ¼ 5, the average value of D across all n and O% is 3:51% while the maximum being 98.25%. On the other hand, with L and C both set to 15, this average increases to 10.12%. When we have fixed the obstacle locations and only robot locations are varied, a better load balancing can be noticed (Figure 17(a)). However, if only obstacles are randomly generated and the robot locations are fixed (Figure 17(b)), the difference with the optimal load metric is similar when both are randomly generated (Figure 14(b)).
On the other hand, a similar trend can be noticed for different obstacle percentages in the environment. For example, with O% ¼ 5, the average value of D across all n; C; and L is 7.34%. This average value of D increases to 7.62% with O% ¼ 15. It is evident from the results that in a more cluttered environment, the repartitioning approach is more effective.
Detected obstacles. We are also interested in finding how many obstacles are detected by the robots. This is important for applications like environment mapping. The results are shown in Figures 18 to 20. The detected obstacle percentages are plotted against the laser ranges of the robots' on-board LIDARs. Higher the range, farther obstacles are detected by the robots. One should note that the results presented here represent the union of all discovered obstacles by all the robots, that is, if an obstacle is observed by more than one robot, it is counted only once.
As expected, with a higher laser range, generally, the robots were able to detect more obstacles in the environment. On the other hand, with more robots present in the environment, more obstacles are detected in general. This      23) a set of sample informative paths planned by different numbers of robots in the two tested environments. The red circles represent the sequence of cells visited by the robots, that is, paths and the black þ signs indicate the detected obstacles in the environments. As each robot's local GP model is trying to reduce the information measurement uncertainty, we can notice that the robots are usually scattering in the environment while collecting information.
Key findings. The key findings from the results can be summarized as follows: (1) Repartitioning is more effective in a cluttered environment; (2) with higher values of C and L (i.e. better sensors), load balancing is more effective; (3) with a higher number of robots present in the environment, the prediction model is better; and (4) the percentage of obstacles identified in the environment is proportional to the number of robots used.

Limitations and discussion
Our results show that when two or more robots come within communication range, they are successfully able to reduce the imbalance in their workloads by repartitioning their current Voronoi cells. Although our approach can handle online repartitioning successfully with convex obstacles, it does not work with concave obstacles. Online repartitioning with concave obstacles will require further sophisticated communication mechanism. As our model does not require the robots to form a connected network at any point in time, we cannot guarantee a complete allocation of all regions with concave obstacles present in the environment with our current approach. As the environment size increases, each robot will have a larger region to cover. Thus, it will increase the run time to repartition such environments. Furthermore, the GP-based learning would also be more time consuming due to the increased POI set-size. Our proposed approach also does not handle a couple of corner cases, as shown in Figure 24. The first scenario is depicted in Figure 24(a), where two robots r i and r k are within their communication ranges but their current Voronoi cells do not intersect with each other. As these cells are disjoint, if the robots were to do repartition their cells, the union of them would produce multiple disjoint polygons. Our algorithm does not particularly handle this scenario.
Thus, r i and r k continue their information collection work in their current cells without executing the repartitioning procedure. Secondly, if two robots r i and r j are within their communication ranges but their respective Voronoi cells intersect on a single point, the robots do not participate in the repartitioning process. As the Voronoi cells intersect on a point only, if one of the robots were to go to the other side of that intersecting point after the repartitioning, it would need to go through that single point. We do not explicitly handle this situation and the robots continue to collect information in their current cells. The scenario is shown in Figure 24(b). On the other hand, if r i and r j are within each other's communication range and r j and r k are within each other's communication range but r i and r k are outside of each other's communication range, then the repartitioning is possible through r j (Figure 24(a)).
Our work in this article is the first to study the MIPP problem under communication constraints in unknown environments. Given that there is no guarantee that two or more robots will meet each other during the information collection process, it is difficult to provide optimal load balancing. However, we have modeled the solution in a way that whenever robots meet with each other, they locally balance their loads. Although we have used a greedy informative path planning strategy that has been proved to yield reasonably good solutions in the literature, 2,4 any other sophisticated information collection strategies, 18 or mutual information-based optimization technique instead of entropy-based, 10 can be used in its place. This will not change the overall coordination strategy for the communication constrained robots in an initially unknown environment proposed in this article. This will help the future researchers in this topic to better reason about the impact of the highly practical yet less studied communication constraints of the robots, especially in environments, where the obstacle locations and shapes are initially unknown. Moreover, our proposed framework can easily be adapted for coverage path planning, that is, to cover all the free locations in the environment while handling the robot's limited communication ranges in an unknown environment.

Conclusion and future work
In this article, we have proposed a continuous Voronoi partitioning-based informative path planning algorithm to collect the maximal amount of information from an unknown environment using a set of mobile robots. In our model, the robots initially do not have any knowledge about the environment. As the robots move to collect more information, they possibly discover new obstacles and their corresponding perceptions about their allocated workloads, that is, the amount of area they need to cover for information collection, change. When two or more robots come within each other's communication ranges, they share their current perceptions about the environment, and based on their local knowledge bases, they repartition their initial Voronoi cells, which have been allocated to them without taking the obstacles of this unknown environment into account. This helps the robots to achieve a better workload balance. Our used path planning approach is distributed in nature and each robot plans its local path only for one future informative way-point at a time. Results show that our proposed strategy helps the robots to model the underlying spatial phenomenon that is close to reality while successfully reducing the imbalance in the amount of allocated free areas to the robots. Physical robot implementation and experiments will be part of future work. Details of physical implementation such as motion planning around obstacles and noise from sensors will be addressed as part of physical implementation along with appropriate techniques from handling these issues. Right now, we have made simplifying assumptions for these aspects to focus on the computational aspects of the problem that is the main contribution of the article. We also plan to incorporate various connectivity constraints proposed in the literature, for example, periodic and continuous, into our proposed framework.