Nonparametric Bootstrap-Based Multihop Localization Algorithm for Large-Scale Wireless Sensor Networks in Complex Environments

This paper presents a nonparametric bootstrap multihop localization algorithm for large-scale wireless sensor networks (WSNs) in complex environments. Unlike most of the existing schemes, this work is based on the consideration that it is not feasible to obtain a lot of available distance measurements sample for estimation and to get exact noise distributions or enough prior information for conventional statistical methods, which is a situation commonly encountered in complex environments practically. For the first time, we introduce a nonparametric bootstrap method into multihop localization to build confidence intervals for multihop distance estimation, which can eliminate the risk of small sample size and unknown distribution. On this basis, we integrate the interval analysis method with bootstrap approach for ordinary nodes localization. To reduce the computational complexity, boxes approach is utilized to approximate the irregular intersections. Simulation results show that our proposed scheme is less affected by the variation of unknown distributions and indicate that our method can achieve high localization coverage with relatively small average localization error in large-scale WSNs, especially in sparse and complex network with smaller connectivity and anchor percentage.


Introduction
In the last few years, there has been a rapidly growing interest in extensive monitoring applications of wireless sensor networks (WSNs). One important reason is that, as opposed to traditional solutions, WSNs can be rapidly and randomly deployed in a large-scale monitoring region by plane or unmanned aerial vehicle (UAV), of which the environments of monitoring region are always complex even inaccessible.
As new technologies, WSNs can provide the means for long-term, all-weather, real-time, accurate, and extensive monitoring unattended, which brought us a convenient way to sense and monitor the complex environments. It is considered as an ideal system for this type of extensive environment monitoring and can fulfill the needs of various practical applications, for example, marine surveillance, ocean scientific exploration, and commercial exploitation.
For most WSNs applications, localization service is an indispensable part and an essential task. The location of the sensor nodes should be determined for meaningful interpretation of the sensed information. In recent years, a certain amount of research work has been conducted in this interesting research area, and a comprehensive survey is provided in [1][2][3] and the references therein. Some of the solutions are mainly designed for small-scale networks, and although this is also an interesting field, we will not contribute to the region in this paper; instead, we focus on the large-scale localization context.
As we know, out of the consideration of economic rationality, generally, it is too hard to deploy the sensor nodes in a large-scale area with relatively high density and large percentage of anchor nodes. If it is used for monitoring ocean environment, the network will be deployed sparsely and only a minority of anchor nodes can be employed for localization. In 2 International Journal of Distributed Sensor Networks these cases, it will be harder for ordinary nodes to communicate with enough anchor nodes directly so as to measure distances, which are necessary for localization.
To solve this problem, three types of localization schemes are proposed for large-scale localization, namely, mobile anchor assisted localization algorithms [4][5][6], recursive algorithms [7][8][9], and multihop algorithms [10][11][12], respectively. Hereinto, mobile anchor assisted localization methods commonly employ some expensive mobile equipment such as Autonomous Underwater Vehicle (AUV) which can roam across the monitoring region to assist localization. In recursive algorithms, some ordinary nodes that have been localized become secondary anchor nodes and broadcast coordinates to assist other nodes in estimating their locations. The localization process will inevitably suffer from the delay and adverse effects of error propagation and accumulation. Whereas for multihop localization, the ordinary node can infer the distances to its nonneighboring anchor nodes by approximating the length of the shortest path to the Euclidean distance, so as to get enough anchor nodes with known distances for localization. They not only can provide better realtime performance but also have no additional equipment requirement. In comparison, the characteristics of multihop methods are more suitable for large-scale WSNs localization. Therefore, multihop localization method has received more and more attention in recent years.
Certainly there are still many challenges for the multihop localization, especially in the complex environments. Here we define the complex environments as multiple complex terrain conditions (e.g., forest, rocks, marsh, underwater, etc.), random and sparse sensor node distribution, irregular radio propagation pattern, various unknown ambient noises, and multiple anisotropic network situations (e.g., H-type network, C-type network, etc.). There is no doubt that these adverse factors will seriously affect the accuracy of the multihop distance estimation which is the basis of the accurate sensor location estimation. To address this problem, most of the existing schemes have adopted conventional statistical methods and considered distance measurements affected by normal distribution noise and then determined the localization uncertainties through Monte Carlo analysis or nonlinear transformation techniques. For example, in [13], a dynamic localization scheme, the Monte Carlo localization (MCL), has been proposed. In [14], based on the discovery that the multihop distance errors obey normal distribution with various biases, a multihop localization algorithm that incorporates the distance estimation bias has been presented. But for the previous methods, only when the measurements sample size is large and the precondition that the errors obey normal distribution is satisfied, they could perform well in online localization.
However, for the large-scale WSNs in complex environment, it is too hard to get exact distribution characteristics of distance measurements and enough a priori information which is necessary for conventional statistical estimation method, and it is impossible to guarantee repeatable ranging conditions for reproducible measurements so as to obtain a large number of available measurements sample for Monte Carlo analysis or other similar methods. Hence, there is a need to develop a novel algorithm for producing coordinates through multihop localization approach as only very few measurements are available. And it is better that this methodology only needs less prior information and even no statistical assumptions on disturbances.
On the basis of analysis on the challenges of complex environments and limitations of existing studies for large-scale localization, a different approach is proposed and investigated, that is, a multihop localization method based on nonparametric bootstrap for large-scale WSNs in complex environments.
The bootstrap is a powerful technique for assessing the accuracy of a parameter estimator in situations where conventional techniques are not valid. The nonparametric bootstrap method was originally introduced by Efron in [15] and used to address CI estimate for statistics based on independent and identically distributed (i.i.d.) random variable from some unknown distribution ( , ) [16]. Compared with conventional techniques, the significant advantages of the nonparametric bootstrap method are that it does not require any modeling or assumptions on the data and it is more suitable for small sample estimation. In this paper, it can be utilized to build confidence intervals effectively for multihop distance estimation online. On this basis, with interval methods, we can compute the bounds of the possible solutions that correspond to measured quantities and determine the guaranteed regions that involve the correct solution.
This paper is organized as follows. Section 2 introduces the related works and challenges for large-scale WSNs localization. Section 3 presents in details the nonparametric bootstrap and interval analysis method. Section 4 evaluates the performance of our algorithm through experiments, and Section 5 concludes this paper.

Mobile Anchor Assisted Localization Algorithms.
To solve the localization problems of large-scale WSNs, in [4], the authors employ a single mobile beacon to aid in localization. The sensor locations are maintained as probability distributions that are sequentially updated using Monte Carlo sampling as the mobile anchor node moves over the monitoring area. This method relies on the more powerful anchor to perform the calculation and relieves most localization tasks from the less powerful ordinary nodes.
The authors in [5] proposed a range-free localization scheme for WSNs using mobile anchor nodes equipped with four directional antennas. Therein, each mobile anchor node can determine its position via Global Position System (GPS), and then it broadcasts its coordinates as it moves through the region. The ordinary nodes detect these messages and utilize a simple processing scheme to determine their own coordinates based on those of the anchors.
In the AUV-aided localization scheme proposed in [6], AUVs keep roaming across the underwater sensor field to aid in localization. The AUVs can get coordinates from GPS while floating periodically and dive into a fixed depth and navigate through a predefined route using compass and deadreckoning. They can estimate the distance between the AUV International Journal of Distributed Sensor Networks 3 and ordinary nodes through a request/response message pair exchange which contains its coordinates. Therefore, the ordinary nodes can be localized after the message exchange from three different noncoplanar AUV locations.
Although these protocols exploit the mobility of the mobile equipment to overcome the lack of adequate anchor nodes, the major drawbacks of these mobile anchor assisted localization algorithms are that the mobile machine is too expensive for WSNs, and the slow speed of AUV or other machine always introduces high localization delay. In addition, the movement of the anchor nodes will be severely restricted by the complex environments. [7] presented an error control mechanism for recursive localization based on the characteristics of node uncertainty and the active selection strategy of anchor nodes. The error control mechanism only utilizes local knowledge and can mitigate the effect of error propagation for both range and directional sensors to a certain extent.

Recursive Localization Algorithms. Liu and Zhang
Yu et al. proposed a two-stage localization approach in [8]. Firstly, localization process starts from the nodes with the largest numbers of neighbor anchors which have the priority. Then, the coordinates of all neighbor nodes are exploited to improve localization accuracy. During the procedure, a number of measures are also taken to ensure the reliability of each location estimate to avoid abnormal errors and reduce error propagation.
Vemula et al. [9] formulated the sensor localization from a probabilistic point of view and incorporated anchor position uncertainty to estimate the distribution of node coordinates, including iterative least squares and Bayesian methods, Monte Carlo importance sampling, and cost-based methods.
These schemes try to inhibit the propagation and accumulation of localization errors, but the high computational complexity and increased communication cost limit their application in practice. Additionally, this kind of method is also easy to cause the localization delay.

Multihop Localization
Algorithms. DV-distance and DVhop algorithms, as the origination of multihop localization schemes for WSNs, are proposed in [10]. In both algorithms, each anchor node broadcasts a message to its immediate neighbor nodes firstly. Then, the message is propagated in a controlled flood manner so that each ordinary node can estimate the lengths of shortest paths to anchor nodes. When the ordinary node obtains the estimates to enough anchor nodes, its position can be calculated.
Wang and Xiao [11] presented an improved multihop algorithm called i-Multihop which has higher computational complexity. At the beginning, the upper bound constraints are used to filter out the incorrect distance estimations and the estimated position is pinpointed to the intersection constrained by the correct distances. And then, the distance fitting is used to fit correct distance measurements, which makes the final estimated position not to be affected by the layout of anchor nodes.
The authors in [12] proposed three multihop localization schemes based on least squares and multilateration, namely, Taylor-LS, weighted Taylor based least squares, and constrained total least squares, respectively. Additionally, a generalized Cramér-Rao lower bound is developed to analyze the performance of multihop localization approaches.
As we can see, most of the previous schemes considered the measurement uncertainty as normal distribution even zero-mean normal distribution noise and employ conventional error-processing approaches, for example, nonlinear least squares estimator (NLSE), and so forth. However, in complex large-scale WSNs, it is practically too hard to fulfill the requirements which are necessary for these approaches, such as enough a priori information, exact distribution characteristics, and large sample size.
Specifically, for the ordinary nodes, there are not so many anchors that can communicate with them to accomplish distance measurement. The ranging process also cannot be performed again and again due to the limited energy of sensor nodes. Even when cost is not a main concern, it is also impossible to guarantee repeatable conditions for reproducible distance measurement. Hence, it is not practically feasible to obtain a lot of available distance measurements sample for estimation. Moreover, in complex environments, for example, underwater environments, the ranging process will suffer from various non-Gaussian noises, for example, the multipath and waveguide effects, surface scattering, and so forth. Obviously, for the previous noises, it is too hard to get exact distribution characteristics and enough prior information which are necessary for statistical methods. It is also unreasonable to consider that the uncertainties always obey normal distribution. Hence, in this paper, the bootstrap localization method is proposed for large-scale WSNs in complex environments.

Nonparametric Bootstrap-Based Localization Algorithm for Large-Scale WSNs
To solve the problems mentioned earlier, in this paper, we propose a novel multihop localization scheme which integrates a nonparametric bootstrap distance estimation method with an interval analysis approach. We firstly utilize the nonparametric bootstrap method to provide a confidence interval (CI) as the estimation for distance measurement. On that basis, interval analysis approach is employed to determine the feasible set and search the optimal estimation of coordinates.
3.1. Short Review of CI Estimate. As we know, CI is an interval about the measurement result within which the values that could reasonably be attributed to the measurand may be expected to lie with a given level of confidence. A common approach to address CI estimate based on the normal distribution can be described as follows. Let 1 , 2 , . . . , be i.i.d. Gaussian random variables with known from distance measurements, and suppose that we wish to find a confidence level 1 − interval for the mean . As we know, the random variable (̂− )/( / √ ) obeys standard normal 4 International Journal of Distributed Sensor Networks distribution, and its probability density function (PDF) ( ) can be computed by and its cumulative distribution function (CDF) ( ) can be computed by For a given confidence level 1 − , there exists constant > 0 such that It follows that constant satisfies ( ) = 1 − ( /2), and thus, = = −1 (1 − ( /2)), where −1 has been tabulated. Equality (3) can be rewritten as is the CI corresponding to confidence level 1 − . However, must be practically estimated from the distance measurements, and this method performs well only in the case when is large as per the central limit theorem. In the case where is small ( ≤ 30), this method could be invalid.
If sample size ≤ 30, the CI based on the Student'sdistribution is often utilized to replace the previous method. Under the circumstances, the CIs can be obtained as the following.

Equality (8) can be rewritten as
is the CI corresponding to confidence level 1 − . There into, botĥand can be estimated from the distance measurements.

CI Based on Nonparametric Bootstrap.
In most studies, the CI is commonly and directly obtained through the previous two methods, based on the hypothesis that the distance uncertainty obeys normal distribution and the sample size is sufficiently large. However, as we mentioned before, it is not the case for large-scale WSNs localization in complex environments. When improving accuracy in conventional methods is invalid, bootstrap method can be used to address the CI estimate issues. The CIs of distance measurements can be estimated by employing the nonparametric percentile bootstrap method as follows. Let̂ * represent the 100 th percentile of bootstrap replicationŝ * (1),̂ * (2), . . . ,̂ * ( ). Percentile limit̂l ower , upper of intended coverage 1 − 2 , is directly obtained from the following percentiles: The nonparametric percentile bootstrap methodology to determine the upper and lower bounds on the CI for distance will be described in detail in the bootstrap distance estimation subsection.

Nonparametric Bootstrap-Based Distance Estimation
Approach. The basic concept of bootstrap method is to produce a large number of independent bootstrap distance estimates by resampling the original distance estimate; that is, = ( 1 , 2 , . . . , ), which consists of measurements at random from unknown probability distribution .
A bootstrap resample * = ( * 1 , * 2 , . . . , * ) is obtained as a random sample of size randomly drawn with replacement from the original measurement set . Sample has true mean ; drawing various other samples from the distribution , their means will present distribution̂. The distribution of real estimated mean̂is approximated by the distribution of pseudoestimated mean̂ * from bootstrap resample * . Let = ( 1 , 2 , . . . , ) represent the set of the corresponding distance measurements. Using the nonparametric percentile bootstrap method on this set, we generate resamples; * = ( * 1 , . . . , * ), = 1, . . . , . And then, we calculate the mean of all measurements in * to obtain̂ * * given bŷ * where is often smaller than 5 (which is a reasonable value and common situation in complex WSNs localization practically), and = 1, . . . , . The mean of this distribution is given aŝ * We sort bootstrap estimateŝ * * in an ascending order. Let the sorted distance be given bŷ * * 1 The desired 100 ⋅ (1 − )% nonparametric CI for the distance is given by (̂ * ( 1 ) ,̂ * ( 2 ) ), where 1 is the integer part of /2, 2 = − 1 + 1, and 3 = /2 is the integer part of /2. Furthermore, note that * 1 , 2 , and 3 are referred to as the upper, lower, and middle, respectively.
Finally, we can obtain the interval as the CI of distance corresponding to confidence level 1 − . Median point̂ * ( 3 ) ≅̂ * is almost identical to the mean. The upper, middle, and lower points are used by the algorithm to estimate the CIs for the distance, respectively. For = 0.05 and = 1000, we can get 1 = 25, 2 = 976, and 3 = 500. And the CI corresponding to confidence level 95% can be described as [̂ * (25) ,̂ * (976) ].

Determination of Coordinates Based on Interval Analysis
Method. When we obtain the CI from a small sample set of distance measurements, the interval analysis method will be utilized to determine the positions of ordinary nodes. Consider a network consisting of anchors with known positions and ordinary nodes which need to be localized. The location of node can be described by X = [ , , ] . The CI of measured distance from the ordinary node to one of its anchor nodes can be obtained through nonparametric bootstrap distance estimation approach, and the Euclidean distance can be defined as  Obviously, the desired position estimate is guaranteed to be bounded by an admissible space which can be denoted as ( ) is the intersection of all the spherical caps of center and of internal and external radii =̂l ower and = upper , respectively. Clearly, ( ) has a complicated geometrical shape (see Figure 1).
To reduce the computational burden, boxes approach will be utilized to approximate the irregular intersection of all feasible solutions. For each anchor ordinary pair, their admissible space represents a region between two boxes of length √ 2 and 2 , respectively (see Figure 2).
As we know, interval analysis is usually used to model quantities that vary around a central value within certain bounds. With simple operations, the interval analysis allows to consistently deal with problems involving interval data. Based on the interval analysis theory, the admissible space 6 International Journal of Distributed Sensor Networks of to can be denoted by where is a closed and connected subset of R 3 , and − and + are the minimal and maximal bounds of , respectively. Set theory operations, such as intersection, can be applied to intervals. Consider two intervals 1 and 2 , and their intersection is always an interval which can be denoted as The intersection can be computed by Specifically, the area encircled by the circumscribed and inscribed squares as shown in Figure 2, that is, 's admissible space to , can be computed by: Using the same previous method, we can get 's another admissible space. Finally, the feasible set , an interval vector set contains all the possible coordinates, can be obtained by simple enumeration of the intersection points among the two boxes.
Therefore, the feasible set ( ) of ordinary node can be rewritten as Clearly, the intersection of all subfeasible set is a set of boxes. Regarding the coordinates of all subboxes' centers as samples of X , we can get a sample set and the centre of Θ can be found by * = ( − + + )/2. The optimum point estimate, that is, the desired coordinates, can be obtained bŷ Finally, the coordinates can be given aŝ

Performance Evaluation
In this section, we conduct extensive simulations to evaluate the performance of our proposed localization scheme, called Nonparametric Bootstrap Based Multihop Localization Algorithm (NBMLA). All simulations are run in Matlab R2012a. To reduce the influence of outliers, we take the average of 100 simulation runs as the final data points.

Simulation Settings.
In our simulation experiments, 400 nodes with adjustable transmission range are randomly distributed in a large-scale three dimensional region with a size of 3000 m × 3000 m × 200 m (see Figure 3). We control the density and connectivity of the network by changing the transmission range while keeping the area of deployment the same. Different anchor percentages are also considered in our simulation. Besides our scheme, we also simulate an improved localization algorithm named Taylor-LS for comparison which is almost the same as [12]. We mainly consider three performance metrics: localization coverage, localization error, and time complexity. Localization coverage is defined as the ratio of the localized nodes to the total sensor nodes. Average localization error is the average distance between the estimated positions and the real positions of all ordinary nodes. Distribution boxplots of node localization errors show the average errors, median errors, and maximum outliers. The analysis of time complexity mainly focuses on the communication cost and computation complexity.

Localization Coverage.
In this subsection, we analyze the performance of localization coverage with different network connectivity and anchor percentage, respectively. Figure 4 shows the performance of localization coverage with changing network connectivity while anchor percentage is 15%. We can observe that the localization coverage of our scheme increases monotonically with the connectivity ranging from 4 to 13. But when the connectivity reaches a high value, the coverage rate becomes relatively large and will not change much after that. For example, when the network connectivity is 10, the localization coverage becomes 97% and then changes slower. The reason is that when the network connectivity reaches a certain point, most nodes can get enough anchor nodes by multihop approach and localize themselves.
As shown in Figure 5, the localization coverage of our proposed scheme increases with the anchor number when we change the anchor nodes from 20 (5% anchors) to 80 (20% anchors) with step 10, while keeping the network connectivity as 9. We can see that when the anchor number reaches a certain value, for example, 50 anchors, the coverage rate will not change much after that. We can also see that when the anchor rate is 15% and the connectivity is 9, our scheme can localize more than 95% nodes in the large-scale WSNs. Compared with the results of Taylor-LS, NBMLA can achieve better performance in terms of localization coverage, especially in the low anchor percentage situation.

Localization Error.
As mentioned before, in complex environment the ranging process usually encountered some non-Gaussian noises which stem from multiple factors, for example, multipath effect, multihop distance estimation, and anisotropic network condition. This makes it harder for ordinary nodes to get exact distribution characteristics of the distance measurement noises. For comparison with classical approaches and showing the performance of our algorithm in non-Gaussian noises context, the measurement noise has been assumed to follow Rayleigh distribution, with three percent of real distances as the standard deviations. In other words, the knowledge of the error bound roughly equals nine percent of real distances. This is a reasonable assumption and can be satisfied by the existing distance measurement technologies when suffering multiple adverse factors. Figure 6 plots the relationship between the average localization error and network connectivity when the anchor percentage is 15%. Besides the given Rayleigh distribution, we also simulate a normal distribution noise situation with mean value and standard deviation (10 m, 10 m) for comparison with the classical approaches relying on normal distributions hypotheses. We can observe that the average localization error of our scheme decreases significantly with the increase of network connectivity. It should be noted that our scheme can achieve relatively high localization accuracy even with low network connectivity. This indicates the good localization performance of our proposed scheme in sparse region. And it can be seen that our scheme outperforms the Taylor-LS method in both Gaussian and non-Gaussian cases. It indicates that our scheme is less affected by the variation of distributions.
In Figure 7, we vary the anchor percent, which is ranging from 5% to 20%, and get the accuracy comparisons with different anchors. For our scheme, with the number of anchor nodes varying from 20 to 80, the localization error decreases by 40%. However, the localization error of Taylor-LS decreases only by 20%, when the network connectivity is 9. This suggests that in sparse networks, our scheme can achieve higher localization accuracy just by increasing a little number of anchor nodes.  anchor nodes and ordinary nodes, the overall communication complexity of our scheme is ( ). Firstly, NBMLA needs the times flooding initiated by anchors to make the ordinary nodes know their hop counts, whose overhead is ( ). Besides the flooding, our scheme requires each anchor to propagate its hop counts to three or four hops' neighboring nodes by a confined flooding. This also has the overhead of ( ). The communication complexity of Taylor-LS equals to that of NBMLA. But when the algorithms need more distance samples for parametric statistics or location refinement, our scheme can achieve much lower communication cost compared with Taylor-LS. This is due to the fact that NBMLA obtained lots of distance samples by bootstrap resampling approach rather than repeating communication with anchors which introduces a large communication cost for Taylor-LS. This characteristic is particularly important when the network is sparse. Of course, in this case, the computation complexity of our scheme will be larger than Taylor-LS, because the resampling process is necessary when the original distance sample size is very small. For example, if there are only 5 effective distance measurement samples, the NBMLA needs to resample at least times (>200) for estimating parameter and constructing the CI. Thus, the computation complexity of NBMLA is ( ). Fortunately, in our scheme, the location estimate can be obtained by simple enumeration instead of complex matric arithmetic. And compared with the communication cost, the computation cost of our method is moderate and acceptable.

Conclusions
In this paper, we present a novel multihop localization algorithm which integrates the nonparametric bootstrap method with interval analysis approach for large-scale WSNs in complex environments. In the first phase, the nonparametric bootstrap method is utilized to build confidence interval of distance form a small sample set of available measurements with unknown noise distributions. On that basis, we adopt interval analysis approach to estimate the positions of ordinary nodes. The simulation results demonstrate that our algorithm can get high localization coverage and accuracy while resisting against the variation of unknown noise distributions. Compared with the traditional method, the proposed method can greatly improve the average localization accuracy. Further studies are being conducted to extend the method to mobile WSNs in complex environments.