End-to-end data collection strategy using mobile sink in wireless sensor networks

Several data collection algorithms, which are based on the combination of using mobile sinks and multiple-hop forwarding, have been proposed to prolong the network lifetime of wireless sensor networks. However, most approaches treat the collection point selection and touring path planning as two independent problems, which leads to a sub-optimal solution for data collection. This article proposed an ant colony optimization based end-to-end data collection strategy to perform the collection point selection and the touring path planning simultaneously. The proposed algorithm first constructs a data-forwarding tree, and then heuristically selects collection points and plans a touring path at the same time. The performance evaluation shows that the end-to-end strategy can improve the network lifetime of wireless sensor network compared to other approaches, especially in the unbalanced distribution scenario of sensors. The end-to-end strategy is also capable of being integrated with other methods.


Introduction
Wireless sensor network (WSN) is a distributed network for sensing the environment. Because of the flexibility, WSNs have been widely used in several applications, such as smart homes, 1 medical care, 2 environmental monitoring, 3 disaster prediction, 4 and military field 5 . A WSN consists of a set of sensor nodes deployed in a certain area. Each sensor has limited battery capacity and data buffer of a fixed size, which leads to the issue of energy efficiency. Thus, several protocols are used to achieve the energy efficiency in WSN. Those protocols are mainly categorized into four classes: ratio module, data reduction, sleep/awake schemes, and battery charging. 6 The ratio module considers four states for each sensor node: idle, sleep, transmit, and receive. Most of the time, sensors are in idle state which causes depletion of the sensor node.
The ratio module minimizes the energy consumption by modulation of optimization parameters. However, the ratio module has to manage four states which introduce additional energy consumption. The next category, data reduction, minimizes the quantity of transmitted data packets to achieve energy efficiency 1 and reduce the rate of traffic and latency, too. Nevertheless, it inevitably losses the accuracy and precision of original data. The sleep/awake scheme employs the duty cycling mechanism to diminish the idle state and support the sleep mode. However, the duty cycling leads to sleepy latency and wastes energy for redundant wakeups. The battery charging harvests energy from the wind, solar, and other new energy. It takes nodes with the uneven residual energy into consideration for designing the protocol.
In this article, we focus on the energy efficiency based on the sleep/awake scheme. The sensor nodes periodically generate sensing data and transfer data to the base station through multiple-hop forwarding. Due to the nature of multiple-hop forwarding, the nodes near the base station will receive and transfer more data packets so that those nodes will consume more energy. In this case, the nodes closest to the base station are prone to deplete the energy first. When those nodes are dead, the data of other nodes cannot arrive at the base station, and the network cannot work until the dead nodes are charged. The problem of unbalanced energy consumption will significantly shorten the lifetime of the network. 7 Recently, to alleviate the problem of nonuniform energy consumption, several studies have considered a novel data collection strategy with a mobile sink. The mobile sink plays the role of a data transporter that moves around the WSN and collects the data packets from sensor nodes. There are two major categories of those studies: no-data-forwarding and hybrid approaches. 8 The no-data-forwarding methods apply the mobile sink to visit all sensor nodes to gather data packets, where no communication between sensor nodes is needed and the energy consumption of whole network will significantly reduce. But the touring path of mobile sink is too long, which leads to longer delay of network. The hybrid approaches use some data collection points (CPs) to communicate with some part of sensor nodes and gather their data packets. The mobile sink only visits the CPs to gather data packets. The key is to tradeoff the delay of data collection and energy consumption of communication between nodes. Most of those methods select CPs and plan a touring path for the mobile sink to visit all selected CPs separately.
However, the two problems of CPs selection and path planning interrelate with each other. The selected CPs determine a corresponding touring path while maximum touring length influences the process of selecting CPs. Independently solving the two problems will result in the low utilization of the given maximum touring length, and the full use of maximum length can effectively prolong the network lifetime. As shown in Figure 1, due to the separation of CPs selection and path planning and the local optimization of each stage, it is not uncommon to obtain a result with lower path utility.
To further prolong the network lifetime, we investigate whether we can solve both CPs selection and path planning simultaneously. The following summarize the contributions of this article: We proposed an end-to-end data collection strategy, which selects CPs and plans the touring path heuristically and simultaneously. We first construct a data-forwarding tree (DF-Tree) and then apply an ant colony optimal (ACO)-based algorithm to select CPs in order. We introduced a path estimation method modified from the path construction of EAPC 8 to improve the ACO-based algorithm. Instead of the path found by the best ant, a re-estimated shortest path is used to update the pheromone of ACO. The remaining parts of this article are structured as follows. We review the related works in section ''Related works,'' formulate the problem in section ''Problem formulation,'' detail the proposed end-to-end method in section ''End-to-end data collection strategy,'' show the performance of proposed method in section ''Performance evaluation,'' and conclude this article in section ''Conclusion and future works.''

Related works
In the past few years, several energy-efficient routing protocols for WSN have been proposed. In Nabavi et al., 9 the multi-objective greedy method is designed to find the optimal routing path, where each node selects another node as next hop based on several factors for transmitting information. To balance the energy consumption among all sensor nodes and further prolong the network lifetime, most approaches 10-14 employ a hierarchical routing protocol, where all nodes are first divided into several clusters and then a cluster head is elected to maintain other nodes in each cluster. Although those routing protocols help to reduce the energy consumption, the performance still remains inadequate. In those methods, the nodes closed to the base station will deplete energy more quickly and become the bottleneck of the whole network, which lowers the upper bound of the performance. To alleviate this issue, several mobile sink-based data collection protocols have been proposed, which adopt the mobile sink to gather data packets.
In those methods, a number of data CPs are selected and responsible for receiving data packets from other sensor nodes. The mobile sink moves periodically within the WSN to visit CPs and gather all data packets. According to whether or not the mobile sink passes through the center of the CPs, those methods can be classified into two mainly categories: stop and collect (SC) and collect while moving (CWM).

SC
The SC policy requires that the mobile sink should stop at the location of the sensor for communication. The mobile sink will passes through the center of sensors to collect data. Those methods can be further divided into two classes: no-data-forwarding and hybrid methods.
No-data-forwarding. In the class of no-data-forwarding, most methods have adopted one or more mobile sinks to visit all sensors, which can avoid the multiple-hop communication between sensor nodes. In Ma and collegues, 15,16 the problem of data collection is formulated as the single-hop data gathering problem (SHDGP), which aims to find the shortest touring path for the mobile sink to visits all sensors. A heuristic tourplanning algorithm for a single mobile sink is proposed to solve the SHDGP. However, the speed of moving the mobile sink is much lower than that of dataforwarding through the network, which will result in a higher latency of data update. In most applications of the WSN, the delay is an important performance metric. A WSN with low latency should be designed. One possible solution is to apply multiple mobile sinks to collect data together. Some studies [17][18][19] have proposed several data collection strategies with multiple mobile sinks. Although those studies reduce the latency of data collection, the high cost of mobile sink makes them impractical. To alleviate the problem of the high latency and the high cost caused by mobile sink, some hybrid methods are proposed.
Hybrid. Those hybrid methods 20-26 combines multiplehop forwarding with the use of mobile sink, where some nodes are selected as CPs. All sensor nodes send their data to the nearest CP through multiple-hop. And the mobile sink periodically visits all CPs to collect data packets through a determined touring path constrained by delay. Since the mobile sink has a fixed moving speed, the time delay constraint can be transformed into a maximum length constraint of the touring path of the mobile sink. Most hybrid methods transform the problem of data collection into the problem of CPs selection and path planning, whose objective is to minimize the energy consumption under a maximum touring length constraint. They first cluster all sensor nodes into several classes, and select a cluster head as the CP. Then, the path planning is solved as the travel salesman problem (TSP) using some heuristic algorithms. In Krishnan and Lim, 27 the cluster heads are chosen dynamically based on the remaining energy and then a model free Q-learning approach is introduced to find the shortest routing path. In those methods, selecting CPs and planning the tour are treated as two independent problems, which lead to the sub-optimal solution for the data collection strategy.

CWM
More recently, a new data collection protocol has been proposed, where the mobile can communicate with a sensor when it passes through the communication range of the sensor. The study by Wen et al. 8 selects CPs accounting for the path cost from current CP to next point and the forwarding load of each sensor, and then constructs a data collection path. The data can be collected by mobile sink if the path passes through the communication range of these sensors. The study by Chang et al. 28 dynamically adopts the proper transmission rate based on the data amount of each sensor to construct the shortest and energy-efficient touring path but guarantees the completeness of data collection. In Donta et al., 29 an extended ant colony optimization (ACO)-based method is first designed to select the best set of rendezvous points and the efficient touring path of mobile sink and then a virtual rendezvous points selection is performed to increase the data gathering speed. Although the protocol can reduce the constructed path length, the design of the protocol is more complicated. Sensors should have the ability to sense the location of the mobile sink and the duration that the mobile sink locates in the communication range should be enough to transfer all data packets.
In this article, we focus on the SC policy and propose a heuristic end-to-end data collection strategy based on ACO, where CPs selection and tour planning perform at the same time. The proposed ACO-based algorithm can be extended to combining other methods which solve the CP selection and path planning independently, to fully use the maximum path length.

Problem formulation
In this article, we focus on a WSN in which sensor nodes periodically generate data packets. All the data packets must be transmitted to the base station within a given delay. There are a set of CPs that responsible for collecting and storing the data packets generated by a certain group of sensor nodes. And a mobile sink can move around the WSN to collect data from the CPs. The objective is to select the set of CPs and plan a tour that passes through all CPs for mobile sink within the given delay. We first detail the problem description, outline some assumptions, and then formulate the problem definition.

Problem description
A WSN consists of a set of sensor nodes, a communication topology network, and a base station. Each node is responsible for sensing environmental status and generating data packets periodically. Those data packets must be transmitted to the base station within a delay T . In order to decrease the energy consumption and prolong the lifetime of WSN, a mobile sink is used to collect data packets. Because the mobile sink has a fixed moving speed, the traveling time is proportional to the touring length. Given the delay T , there is a length L max corresponding to it where L max denotes the maximum touring length. To satisfy the length constrain, all sensor nodes are divided into several groups. Each group selects a node as the CP, which is used to gather data packets from other nodes and upload those packets to the mobile sink. Other nodes will transmit their data packets to the CP through multiple-hop. The mobile sink visits all CPs and collects all data packets in a round.
Transmitting and receiving one data packet will consume several energies. The energy consumption is proportional to the packet size. In this article, we assume the sensor nodes generate one data packet with fixed size in a round. We can formulate the energy consumption of sensor i in each round as following where E t and E r denote the energy consumption of transmitting and receiving one packet, respectively, and nd i is the number of packets sent to sensor i. The lifetime of WSN is dependent on the node which first runs out of energy. It is obviously that the CPs consume the most energy. To prolong the lifetime of WSN, we should minimize the energy consumption of the CP which has maximum energy consumption. The objective is defined as following min max where m is the number of CPs. From equation (1), we can find that E i }nd i . For the CP i, the (nd i + 1) is equal to the number of nodes in the group containing CP i. Minimizing the energy consumption of a CP is equivalent to minimizing the size of the group containing the CP.

Assumption
We make some assumptions as following: The mobile sink moves at a fixed speed. The mobile sink collects data packets from a CP only when it arrives at the position of the CP.
Comparing to the touring time of mobile sink, the communication time between the mobile sink and a CP is small enough to be neglected. Similarly, the time of transmission and reception between sensor nodes is negligible compared to the touring time. Each CP has enough capacity to store all sensed data.
The sensor nodes have a fixed communication range and initial energy. There are no isolated nodes, and each sensor node can communicate with other nodes with multiple-hop. Each sensor node produces on data packet each round.
The energy of mobile sink is sufficient to visit all CPs and gather data packets in a round within the given delay.

Problem definition
The topology of WSN can be formulated as an undirected graph G = hV , Ei, where V is the set of sensor nodes in WSN and E is the edges between two sensor nodes where

End-to-end data collection strategy
This section details our proposed algorithm for solving the problem described in section ''Problem formulation.'' The proposed algorithm is based on the key observation that selecting CPs and planning a tour are not independent. The proposed data collection strategy constructs a DF-Tree first and then uses the ACObased algorithm to jointly optimize the CPs selection and path planning. The following details the proposed method and describes how the method can be extended.

Initial phase
Given a graph G = hV , Ei denoting the communication topology of WSN, we first construct a DF-Tree. The DF-Tree is rooted by the base station. All nodes in the tree can only send their data packets to their parent nodes. When a node is selected as the CP, there are several changes to the tree structure. First, the edge between the CP and its parent is broken. That means the CP does not need to send packets to its parent but directly upload to the mobile sink. In addition, a new tree rooted by the CP is generated. In this case, an isolated tree can be viewed as a group. All nodes forward the data to the CP through their parents in their own groups. In this case, the nd i mentioned above denotes the number of nodes in the sub-tree rooted by node i .
Motivated by Wen et al., 8 we construct a minimum spanning tree as the DF-Tree which has the minimum total forwarding cost. The Prim algorithm is applied to construct the DF-Tree.
In addition, a complete graph G 0 = hV , E 0 i is constructed based on the original input graph G, where E 0 denotes the possible movements of the mobile sink.

End-to-end joint optimization framework
Based on the constructed DF-Tree, we aim to select CPs and plan a path. The essence of path planning is to determine the order of the CPs. In fact, the process of selecting CPs also can be viewed as a kind of access order of CPs. Based on this insight, we propose an endto-end framework to jointly optimize the two problems using improved ACO methods.
ACO for DC. ACO is a heuristic algorithm, which has been proved to be effective for solving combinatorial optimization problem. The basic idea of ACO is motivated by practical ant colony system. When an ant walks on the road, it will release some pheromone. The ants will prefer to move toward the direction which contains more pheromone. While the path with the better solution always contains more pheromone. In the end, most ants will likely choose the best path which has the most pheromone. The key of ACO is how to set up and update the pheromone. The goal of the problem is to minimize max nd i . A better solution of the problem means a smaller max nd i . Based on this point, we propose an ACO-based algorithm to solve CPs selection and tour planning simultaneously. We iteratively update the pheromone using the best result of the elitist ant at each iteration. In each iteration, several ants search the feasible solutions based on a probability function independently. And then, the best solution with minimum max nd i is used to update the pheromone. The probability function is defined as following where P ij denotes the probability of moving from node i to node j for an ant, J (i) is the set of neighbor nodes of node i in the graph G 0 but have not been visited for the ant, t ij is the amount of pheromone, and h ij is the heuristic value which is defined as following where dist(i, j) is the Euclidean distance, and w j denotes the gain value of selecting node j as the CP which is defined as following where H(j, i) is the hop distance from node j to node i in the DF-Tree, and root j is the root of the sub-tree containing node j . When node j is selected as CP, node j does not need to send data packets to its parent. And all nodes in the path from node j to the root will save energy consumption.
denotes the best solution after kth iteration. The elements in the T present the CPs and the order corresponds to the touring order of the mobile sink. The following rule is used to update the pheromone where r is the pheromone evaporation rate, and Q and s are the tunable controlling scalar. Dt denote the increment of the pheromone, M nd is the maximum of all nd i , and F is the fairness index (FI), which describes the degree of load balance for all CPs. The proposed ACO-based algorithm details in Algorithm 1.
Path estimation. Because of the randomness of ACObased algorithm, it is easy to get a path in which there are several intersecting segments. With the same CPs, a touring path with intersecting segments means a longer touring length. The problem of intersecting segments also leads to the algorithm get a sub-optimal solution. To alleviate the problem, we utilize an efficient path estimation algorithm to compute the touring length L T and refine the touring path. In Algorithm 1, a minimum touring length is estimated instead of directly computing the length based on T at step 24. At step 29, the T best is converted to a polygon to avoid the segment intersection. The path planning problem of the CPs is a traveler salesman problem that has been proved to be a NP-hard problem. Motivated by Wen et al., 8 we introduce a convex polygon-based path estimation algorithm. The algorithm is similar to the path construction proposed in Wen et al. 8 The following briefly presents the method.
The algorithm consists of three steps: the construction of the convex polygon, the connection of internal points, and the estimation of touring length.
Construction of the convex polygon. This step aims to construct a bounding box of all CPs where the box is a convex polygon and has the minimum area. First, the point at the bottom-left is selected as the start point p 0 . Let l be a horizontal line passing through p 0 . Then, we take the p 0 as the center and turn l in an anticlockwise direction until the line touches any other CP, which willed be labeled with p 1 . Then the touched point serves as the new center and the turning operation is repeatedly executed. The touched points will be labeled orderly with fp 1 , p 2 , p 3 , :::g. The operation will be repeated until the touched point is the start point p 0 . At the end, a convex polygon hp 0 , p 1 , p 2 , :::, p k , p 0 i is constructed where (k + 1) denotes the number of vertexes of the convex polygon. Note that there is something different about the selection of the start point between our method and Wen et al. 8 The start point is the base station in Wen et al., 8 while it is the point at the bottom-left in our method. for j = 0 ! jVj do 12: if v j in T then 13: Connection of the internal points. After the convex polygon is constructed, there may remain some unconnected points. Obviously, those points are all located at the inside of the convex polygon. We iteratively joint the remaining points into the polygon. Let P CP and P R denote the set of vertexes in the convex polygon and the set of the remaining points, respectively. At each iteration, a point V 2 P R with the least selection cost will be selected and inserted into the polygon. The selection cost of a point V is defined as following The cost V evaluates the increment of length after inserting point V into the polygon which is proposed in Wen et al. 8 The position to insert the selected point V 0 is at pos, where pos = arg min where P i 2 P CP . After the point V 0 is inserted into the polygon, there are several changes in the polygon. The V 0 will be labeled with P pos , and the index of the label of the original points fP pos , P pos + 1 , :::g will plus one, that is, their new labels are fP pos + 1 , P pos + 2 , :::g successively. In addition, the point V 0 will be removed from P R . Then, the above process will be repeatedly executed until the P R is empty.
Estimation of touring length. When P R is empty, the P CP presents a touring path which is a polygon. We use the total length of the polygon to estimate the minimum length of the touring path. Note that the estimated length and the polygon are only applied at Step 24 and Step 29, respectively, in Algorithm 1.

Complexity analysis
The proposed data collection strategy consists of an initial phase and an ACO-based CPs selection and tour planning. In the initial phase, the Prim algorithm is employed to construct the DF-Tree, whose time complexity is O(n 2 ) where n denotes the number of sensors.
In the ACO-based CPs selection and tour-planning stage, which is described in

Combining with other methods
Due to the flexibility of the parameters setting of ACO, the proposed end-to-end strategy can be easily extended to other methods. Given a data collection algorithm G that independently solves the CP selection and path planning, we first construct a heuristic value table h to describe the process of selecting CPs. The h ij denotes the probability of selecting node j as next CP when current CP is node i in G. Then, the h ij is applied in equation (5). In this case, the Algorithm 1 can be viewed as a combination of ACO and G.

Performance evaluation
To validate the effectiveness of the proposed method, we have conducted extensive experiments. The Python simulator is used to simulate the network environment. We consider a network area that measures 300 m 3 300 m. Several sensor nodes are randomly deployed over the network and can communicate with each other through multiple-hop. The communication range is 30 m. In general, we assume that each sensor node generates one packet per round of data collection, where the packet has a fixed size. And the consumed energy for transmitting and receiving one packet is 0.021 and 0.015 J, respectively. The initial battery energy of each sensor is 20 J. In addition, the moving speed of the mobile sink is 1 m/s. All the simulation settings are described in Table 1.
The deployment of sensor nodes is dependent on its certain application. Different deployment characteristics will result in different performance evaluations. We conduct extensive experiment on two scenarios: Unbalanced deployment (UD) scenario: all sensor nodes are uniformly deployed in the network area.  To benchmark our method, we compare it to the version of ''stop and collect'' of Energy-Aware Path Construction (EAPC-SC) algorithm 8 in terms of network lifetime, total energy consumption, FI, efficient index (EI), and running time. The EAPC-SC algorithm has more efficient performance than existing data collection mechanisms, which is applied to solve the same problem as the considered in this article. The EAPC-SC first constructs a minimal spanning tree (MST), selects a set of CPs, and then constructs a data collection path. The selection of CPs is based on a heuristic function, which is the same as the h ij in our proposed algorithm. In addition, the path estimation is a modified version of the path construction of the EAPC-SC. In somewhat, our proposed algorithm is an evolution of the EAPC-SC. Our proposed method handles the problem of sub-optimal solution caused by the independence between CPs selection and path planning. We compare with the EAPC-SC algorithm to verify the feasibility of the proposed method. Besides, we also compare our method with the dynamic directional routing (DDR) protocol, 30 which adapt to the mobility of sensor nodes to optimize the routes toward the static sink. The DDR protocol is a state-of-the-art routing algorithm without mobile sink, which achieve a reliable and energy-efficient routing.
In the experiments, the distributions of sensors are randomly initialized and the proposed algorithm is based on the ACO algorithm which is stochastic. Therefore, we conducted 10 repeated experiments to alleviate the effect from randomness.

Network lifetime
The network lifetime describes the maximum working time of the whole WSN, which is an important metric of a well-designed WSN. And it uses round as a unit of measure. The higher value denotes better performance. In each round, a mobile sink visits all CPs based on the arranged routes and collects all data packets from CPs. The lifetime is defined as the time until one node runs out of energy. And very simplistically, we assume the sensors consuming energy only for transmitting and receiving data packets. The lifetime of network depends on the number of total sensor nodes and the touring path of mobile sink. We compare the lifetime under various numbers of sensor nodes for two scenarios, where the number of sensors ranges from 10 to 300. The results are shown in Tables 2 and 3, where the

Total energy consumption
Total energy consumption is defined as the sum of the energy consumption of all sensors in the WSN per round. Figure 3 compares the total energy consumption of WSN for different algorithms under two scenarios. The result shows that our proposed method results in comparable energy consumption per round with other protocols.

FI
Since the mobile sink-based data collection strategy is introduced to alleviate the problem of non-uniform energy consumption, the degree of uniformity should be quantified. A higher uniformity of energy consumption will lead to lower total energy consumption and higher network lifetime. The FI reflects the degree of the balance between nodes, which is defined as following where n is the number of sensor nodes and E i is the energy consumption of sensor i per round. Figure 4 compares the FI of different methods, where reveals the proposed end-to-end strategy can also get a higher up-bound of the FI.

EI
The given delay constraint is correspondent to a maximum touring length for mobile sink. The full use of the maximum length can effectively improve the network lifetime and FI. The EI describes the utilization of the given maximum length, which is defined as following where m is the number of CPs, N is the total number of sensors, L act and L max denote the actual touring length of mobile sink and the maximum length, respectively. Figure 5 compares the EI of different methods. The result shows that our method leads to a comparable EI to EAPC. Since the DDR protocol uses the static sink, the actual touring path length is zero.

Running time
The running time of different methods with various numbers of nodes is compared in Figure 6. The time complexity of EAPC and DDR methods is of the same magnitude, and the running time of the two methods   drew in the figure is nearly overlapped. To better distinguish, we change the scale of the vertical axis to logarithmic. Our proposed method is more time-consuming and has minor responsiveness, which makes it hard to be applied in online situations. However, there are many applications running algorithms to generate data collection strategies offline, where the structure of WSN is almost invariant and the proposed method is performed only when the structure is changed. Thus, the running time is acceptable in the offline cases.

Conclusion and future works
Data collection is an important topic in WSNs because it is energy-consumed and affects the network lifetime. The use of mobile sink can significantly reduce the energy consumption and prolong the network lifetime.
In this article, we proposed an end-to-end data collection strategy, which selects appropriate sensor nodes as CPs and plans a path of visiting all CPs for mobile sink simultaneously. The proposed strategy is based on ACO and can be extended to combining with other data collection approaches. The experimental results reveal that the proposed end-to-end strategy can achieve a better performance in terms of network lifetime, total energy consumption, and FI, especially in the UD scenario.
In the future, we will extend our end-to-end strategy to combining with other data collection methods to demonstrate the generalization capacity of our method. Currently, we do not pay much attention to the runtime performance issues, it is promising for further performance improvements on other methods.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.