Constructing Maximum-Lifetime Data-Gathering Tree in WSNs Based on Compressed Sensing

Data gathering is one of the most important operations in many wireless sensor networks (WSNs) applications. In order to implement data gathering, a tree structure rooted at the sink is usually defined. In most wireless sensor networks, nodes are powered by batteries with limited energy. Prolonging network lifetime is a critical issue for WSNs. As a technique for signal processing, compressed sensing (CS) is being increasingly applied to wireless sensor networks for saving energy. Compressive sensing can reduce the number of data transmissions and balance the traffic load throughout networks. In this paper, we investigate data gathering in wireless sensor networks using CS and aim at constructing a maximum-lifetime data-gathering tree. The lifetime of the network is defined as the number of data-gathering rounds until the first node depletes its energy. Based on the hybrid-CS data-gathering model, we first construct an arbitrary data-gathering tree and then use the random switching decision and optimal parent node selecting strategy to adjust the load of the bottleneck node and prolong the network lifetime. Simulation results show that the proposed algorithm outperforms several existing approaches in terms of network lifetime.


Introduction
Wireless sensor networks (WSNs) consist of a great number of nodes that sense the environment and collaboratively work to process and route the sensory data. They have a great application value in the fields of habitat monitoring, medical care, battlefield monitoring, and so on. Data gathering is a basic operation in WSNs, where sensors are responsible for collecting all sensory data and delivering them to the sink node [1]. Typically, sensors have strictly limited communication abilities and energy resources; therefore, how to reduce energy consumption in data gathering so as to prolong network lifetime is an important issue [2].
The problem of energy efficient data gathering in WSNs has been extensively investigated. Data-gathering methods based on cluster and tree are proposed in many literatures [3,4]. The goal of such methods is to construct a network topology in order to use energy resources of sensor node effectively. However, these methods cannot overcome the "hot pot" problem; that is, nodes close to the sink would suffer from heavy loads and their energy would be exhausted soon, which would shorten the network lifetime. Data aggregation [5] and source coding technique [6] are two efficient methods to overcome the "hot pot" problem. However, data aggregation adopts a simple aggregation function that only extracts certain statistical information from the sensory data. The sink node cannot recover all sensory data. Hence, this technique can only be applied to particular applications that require partial information from WSNs. The distributed source coding technique performs noncollaborative data compression at the source nodes; it is not exactly practical in WSNs either due to the lack of prior knowledge of the data correlation or because of resulting in high communication load.
Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems [7]. This technique promises to successfully recover the original signal 2 International Journal of Distributed Sensor Networks from far fewer measurements than its original dimension, as long as the signal is sparse or compressible in some domain. Recently, the growing interest in CS has inspired many works in WSNs [8][9][10][11]. Many researches show that the CS technique suits well data gathering in WSNs because the sensory data is generally quite sparse in nature [8]. The authors in [12] made the first effort to speculate the potential of using CS theory for data gathering in multihop WSNs. However, they have not proposed a real scheme based on this initial idea. The authors in [8] firstly presented a complete framework for data gathering based on CS in large-scale WSNs where a large number of sensor nodes are densely deployed and sensor readings are spatially correlated. In Compressive Data Gathering (CDG) proposed in [8], the sensory data are transmitted as weighted sums on a routing path so as to reduce global communication cost without introducing intensive computation or complicated transmission control. However, CDG can result in many unnecessary data transmissions during data-gathering procedure.
In [13], the authors introduced two data-gathering models, that is, plain-CS and hybrid-CS. The plain-CS model which is the same as the CDG scheme uses CS encoding for all nodes in the network. The hybrid-CS model applies CS only to relay nodes that are overloaded. The hybrid-CS model marries the merits of nonaggregation data gathering and plain-CS scheme. Recently, Xiang et al. in [14] introduced a Minimum Energy Compressed Data Aggregation (MECDA) algorithm to implement the data gathering in WSNs based on hybrid-CS technique. The aim of MECDA is to allocate the traffic load properly in a given data-gathering round, through joint routing and aggregator assignment, such that the total energy consumption is minimized. Unfortunately, MECDA only takes into account the overall energy consumption of sensor network and does not consider balancing the energy consumption of each node; therefore, the energy of some sensor nodes may be exhausted quickly, which will affect the network lifetime eventually. Therefore, it is significant for the researchers to design a maximum-lifetime datagathering method which balances the energy consumption of sensor nodes by using CS theory in designing of underlying routing.
In this paper, we propose a method of constructing maximum-lifetime data-gathering tree (MLDGT) in WSNs based on the hybrid-CS data-gathering model. The contributions of this paper are as follows: (1) We first define the problem of maximum-lifetime data-gathering tree in WSNs based on compressed sensing, prove that it is NP-complete, and then analyze the characterizations of the optimal solutions. (2) We propose a novel algorithm, which used random switching and optimal parent node selecting strategy to keep transferring descendants of bottleneck node to realize load balancing and extending the lifetime of WSNs. (3) Extensive experiments have been conducted to verify that our algorithm achieves higher lifetime than existing approaches.
The rest of the paper is organized as follows. In Section 2, we give a brief overview of applying compressed sensing in data gathering of WSNs. Section 3 formally defines the problem of maximum-lifetime data-gathering tree and analyzes the complexity of the problem. The proposed algorithms are detailed and analyzed in Section 4. Simulation results are presented in Section 5 and conclusions are drawn in Section 6.

Compressed Sensing Basics.
In CS framework, the realvalued signal X ∈ R can be reconstructed from a far fewer measurements than its original dimension as long as the signal X is sparse or compressible in a certain domain. Let X = [ 1 , 2 , . . . , ] be -dimensional original data that can be represented in some orthonormal basis Ψ = { } =1 (let be -dimensional column vector) as X = ∑ =1 = Ψ . If the coefficient vector = [ 1 , 2 , . . . , ] is -sparse, that is, there are only ( ≪ ) nonzero entries in vector , the signal is -sparse in basis Ψ. Let Φ = [ 1 , 2 , . . . , ] be the × -dimensional measurement matrix whose row vectors are largely incoherent with Ψ. Suppose Y is the measurements of the -sparse X; that is, Y = ΦX = ΦΨ .
According to CS theory, we can recover X from Y by solving the following optimization problem: Supposêis the optimal solution of the optimization problem (1). The recovery signal of X can be obtained bŷ The difference between the original signal X and recovery signalX represents the practical performance of CS coding, which depends on the scarcity of the original signal, as well as on the reconstruction algorithm. Because the sensory data of WSNs is generally quite sparse in nature, CS suits well data gathering in WSNs [15,16]. In this paper, we define the ratio of the dimension of measurement vector Y to the dimension of original sensory data vector X as sampling rate ; that is, = / . The sampling rate is an important parameter that affects the energy consumption of nodes in a data-gathering tree that is constructed based on CS.

Overview of Data Gathering Based on Compressed Sensing in WSNs.
For the traditional data-gathering method of WSNs, nodes around the sink need to transmit more raw data and consume more energy than the downstream nodes of the sensor network. The unbalanced energy consumption has a major impact on network lifetime. In [8], Luo et al. proposed CDG for data gathering in large-scale WSNs which is an efficient way to alleviate the bottleneck. x 3 · · · x 2 · · · To illustrate the idea of data gathering based on CS, we rewrite the CS coding as where is sensory data of node V and is th column vector of measurement matrix Φ. In CDG, instead of individual readings, the weighted sums of sensor readings are delivered to the sink. In the process of data gathering, node V first multiplies its reading with and then adds the product to the -dimensional column vector ∑ +1 =1 which is received from children nodes. The sum ∑ =1 = ∑ −1 =1 + will be sent to the parent node of V . If each node performs this operation, the compressed measurement will be completed during the process of data gathering. Eventually, the sink collects the -dimension vector Y, and then the decoding algorithm is used to recover the original sensory data. The method mentioned above is referred to as plain-CS model. In plain-CS data gathering, each node needs to send exactly an -dimensional vector irrespective of what it has received; there is no "hot pot" phenomenon.
However, the plain-CS model may lead to unnecessary higher traffic at the earlier stage of data gathering. Therefore, Luo et al. in [13] proposed the hybrid-CS data-gathering model. The main idea of hybrid-CS model is that the non-CS scheme is applied in the earlier stages of data gathering, and the CS-based compression is only applied at node whose incoming traffic intensity becomes larger than or equal to . The data-gathering process based on hybrid-CS model is depicted in Figure 1 through a simple chain-type topology. There are sensor nodes {V 1 , V 2 , . . . , V }, and each node senses one piece of data in a round of data gathering. These data need to be transmitted to the sink node. During the transmission, sensor nodes V 1 , V 2 , . . . , V transmit the raw sensory data 1 , 2 , . . . , directly, and the amount of transmitting data of each node is no more than . But nodes +1 , +2 , . . . , transmit data based on CS; that is, node V ( < < ) transmits data ∑ =1 , and the transmission load of each node in set { +1 , +2 , . . . , } is equal to .
In this paper, we focus on maximizing the lifetime of data-gathering trees based on hybrid-CS model. As can be seen from the above analysis, the energy consumption characteristic of sensor node brings about a new challenge for constructing a maximum-lifetime data-gathering tree based on hybrid-CS model.

System Model.
We consider a static wireless sensor network with sensor nodes {V 1 , V 2 , . . . , V } and one sink node . All sensor nodes are randomly deployed over an × Euclidean plane to continuously monitor the environment and periodically report data to the sink. We use an undirected graph ( , ) to represent the WSNs, where is the set of all nodes and is the set of edges. The underlying communication graph is a unit disk graph. All nodes have the same transmission range . There is an edge (V , V ) ∈ whenever the Euclidean distance between nodes V and V satisfies ‖V − V ‖ ≤ . Each node has unique initial energy. Assume that (V ) denotes the initial energy of node V . The sink is AC-powered which has infinite power supply and powerful computation ability. Since data communication is the dominant energy consumption in sensor network, we can only take into account the energy consumption of communication. At each data-gathering round, each node generates a reading which is referred to as one-unit data. We denote by Rx and Tx the energy consumption of one-unit data transmission and reception, respectively.
The architecture for hybrid-CS data gathering that we refer to in [13,14,17] is shown in Figure 2. A spanning tree rooted at sink is adopted to gather the readings of the whole network. In Figure 2, the arrowed line represents the transmission link, and the data on the left side of each line is ready for transmitting. For leaf nodes, such as nodes V 1 ∼V and V +3 ∼V +6 , the original sensory data will be transmitted to their parents. For other nodes, the outgoing traffic has two possibilities: (1) When the number of data received from node V's children is less than , where is the dimension of column vector , node V forwards the received and its owns data directly. The node V is referred to as forwarding node. In Figure 2, the outgoing traffic of node V +7 is data set { +3 , . . . , +6 , +7 }, in which +7

4
International Journal of Distributed Sensor Networks  is its own sensory data; the others are received from its children.
(2) When the number of data received from node V's children nodes is more than , node V performs the CS coding according to (3) and transmits exactly encoded data corresponding to the aggregated column vector. We call node V the aggregating node. In Figure 2, node V +1 is an aggregating node and performs the CS coding ∑ +1

=1
. The outgoing traffic of V +1 is always .
Based on the above analysis, in a given data-gathering tree , the outgoing traffic of node V is where Ch(V) is the set of V's children nodes. Thus, the energy consumption of node V in each data-gathering round is

Problem Formulation.
In this paper, we define a datagathering round as the process of gathering all data from nodes to the sink, regardless of how much time it takes [18]. The lifetime of a node V in tree is defined as the number of data-gathering rounds since it began to work until its death: .
We also define the load, node ( , V), of node V as the ratio of ( , V) to (V); that is, The lifetime of a tree is defined as the lifetime of the first dead node in the tree: International Journal of Distributed Sensor Networks 5 For any network , we refer to ( ) as the set of all possible trees for . The problem of finding a maximum-lifetime datagathering tree (MLDGT) based on hybrid-CS model can be defined as This problem that is similar to the maximum-lifetime problem proposed in [18][19][20] is an NP-complete problem. The unique property of the problem is the energy consumption model of the node in data-gathering tree based on hybrid-CS model, which brings about new challenges for constructing data-gathering tree. Since the problem is NP-complete, we try to find an approximate solution to solve the problem. Details of the solution techniques are shown in the next section.

Basic Ideal of Solution Techniques.
To facilitate the analysis, we first provide some relevant assumptions and definitions in this section. We denote by ( , V ) the path from node V to sink in spanning tree . The expression V ∈ ( , V ) represents the notion that node V is a relaying node on the path ( , V ). In a spanning tree , the nodes on the path ( , V ) may have different loads. We define path load, path ( , V ), of node V as the maximum load of all nodes along the path from node V to sink ; that is, And the node with the largest load on the path ( , V ) is called bottleneck node on the path ( , V ) of node V . Formula (9) can be transformed into an equivalent form: Let ( ) denote the set of all leaf nodes in tree . Assume that = | ( )| is the number of nodes in set ( ), and the elements of set ( ) are 1 , 2 ,. . ., and . Thus, the sensor node set can be divided into paths of leaf nodes; that is, Based on the division of node set , formula (11) is equal to Therefore, the MLDGT problem is transformed to construct a spanning tree, in which the maximum path load of leaf nodes is minimal. Inspired by [19], in order to solve the problem of formula (13), we need to find a spanning tree in which the difference between the maximum path load and the minimum path load of leaf node is minimal. Therefore, the MLDGT problem can be described as follows: For a given connectivity group , we assume that tree opt is an optimal spanning tree which provides the maximum lifetime. Let opt be the difference between the maximum path load and the minimum path load of leaf node in tree opt ; that is, Each connectivity group has a unique opt which depends on the node's initial energy, network deployment, and network connectivity of WSNs. It is an NP-complete problem for finding opt of any given connectivity group.
We can enumerate all spanning trees of graph and find a spanning tree which minimizes the maximum path load of leaf nodes. However, the number of the spanning trees of graph can be an exponential function of the number of nodes . Therefore, we try to find an approximate solution to minimize the maximum path load of leaf node in the datagathering tree. Details of the algorithm are described in the next section.

Algorithm Description.
In this section, we design an approximate algorithm to deal with the problem raised by formula (13). Our algorithm uses randomized tree transformation and switching method and optimal parent node selecting strategy that make full use of the property of energy consumption in data gathering based on CS to improve convergence speed of the algorithm. International Journal of Distributed Sensor Networks S · · · · · · · · · · · · · · · · · · · · · · · · Our algorithm starts from an arbitrary tree and then keeps transferring descendants of bottleneck node to nodes with smaller path load iteratively. In order to make the algorithm converge, we define a terminate parameter as the maximum allowable difference in the path loads among all leaf nodes. If the current spanning tree satisfies the condition the iterative operation will be terminated, and is the final result. It is obvious that the terminate parameter is more close to opt , and the algorithm result is more close to the optimal solution. Therefore, our algorithm may be superior well under proper choice of terminate parameter .
However, if the current spanning tree does not satisfy condition (16), we need to continue to adjust the load of bottleneck node. We use Figure 3 to illustrate the details of our algorithm. Figure 3 shows a part of the spanning tree . In Figure 3, the solid lines with arrow indicate the directions of data transmission in the tree while the dotted lines without arrow represent the edges in graph . The omitted parts of the tree are represented by the dotted lines with arrow. V denotes the subtree rooted at node V .
In Figure 3, suppose that node V has the highest load in current tree . We need to switch some descendant nodes of V to new parent node. Node V has three children nodes, V +4 , V +5 , and V +6 . Those nodes can be selected to switch to an alternate path. If node V +4 is selected, it may have two potential parent nodes, V +1 and V +2 . Assume that node V +4 is switched to node V +1 . After switching, the load of node V +1 increases and may become greater than the load of node V by more than the terminate parameter . Consequently, the path load of leaf nodes in V +1 may exceed the path load of leaf nodes in V by more than . In the next iteration, node V +4 may be switched to node V again. In such case, we say that an oscillation has occurred. This phenomenon influences convergence and convergence speed of the algorithm. In order to reduce the influence of the oscillation, we adopt some mechanisms such as random switching decision and optimal parent node selecting strategy.
In the random switching decision, whether a node is selected for switching depends on its switching probability. For a considered node, we generated a random number, 0 or 1, according to its switching probability. If the random number is 1, the corresponding node is selected to switch; otherwise, the node is not selected. It is more likely to generate random number 1 when the switching probability is higher. The initial switching probability for each node is 1/2. Then, the switching probability (V ) of node V is decided by the number of switching instances where node V is selected. The switching probability can be expressed as where (V ) denotes the switching numbers of node V . When a node has been selected for switching many times, there may be an oscillation that happened on such node. We use formula (17) to give such node with a smaller switching probability, reduce the chance of the node being selected, and give other nodes more opportunities. This method is beneficial to reduce the oscillation. In Figure 3, if node V +4 is switched many times and the algorithm cannot converge, the switching probability of node V +4 will decrease, and the opportunities of other V 's children node, such as V +5 or V +6 , will increase. This feature is advantageous to improve the convergence of the algorithm.
We have used a random switching decision to select a switching node. We will design an optimal parent node selecting strategy for assigning an optimal parent node to the selected switching node. Intuitively, in graph , each neighbor of a selected switching node is likely to be the parent node of the switching node. However, in fact, only a part of the neighbor nodes are qualified to be the parent eventually. Because the path loads of some neighbor nodes are high originally, the node switching may increase the path load of such nodes, and the oscillation may happen. Therefore, we should choose some neighbor nodes as the potential parent nodes. Suppose that V is a switching node and its original path load is path ( , V ) in tree . If V is switched to a neighbor node V , the new tree is constructed, and the new path load of V is V path ( , V ). If the new path load of V is bigger than the original path load, the neighbor node V cannot become a potential parent node. The set of V 's potential parents can be determined as follows: (18) where Nei V is the set of V 's neighbor nodes.
In neighbor nodes set Nei V , there may be two kinds of node: the first is the forwarding node and the other is the aggregating node. As described in formula (4), when neighbor node V is an aggregating node, the outgoing traffic of aggregating node V is a constant value, . If V switches to V , the outgoing traffic of V is also a constant value ; that is, Input: graph , arbitrary spanning tree , path ( , V ) and node ( , V ) for each V ∈ , max Output: Data-gathering tree (1) Set V ← 0 and (V ) = 1/2 for each V ∈ ; (2) Let V be the node with the highest load in tree ; Remove node V from in FIFO order; 11) if V = ⌀ then (12) = ∪ Ch(V ); (13) else (14) if SwitchingDecision ( (V )) == 1 then (15) Par Update by switching V to Par V , update path ( , V) and node ( , V) for each V ∈ ; the load of nodes on the path from V to sink does not change, and the node load of V changes. Therefore, the new path load where node ( , V ) is the node load of V in the new tree . Otherwise, if neighbor node V is a forwarding node, the loads of nodes on the path from V to sink change, until a node becomes an aggregating node. We need to choose a node V from the set V to be the ultimate parent of V . When node V is switched to the ultimate parent, its new path load is the smallest; that is, In Figure 3, if node V +5 is selected to switch, V +5 has three neighbor nodes, V +1 , V +2 , and V +8 . If nodes V and V +2 satisfy V path ( , V +5 ) < path ( , V +5 ) and path ( , V +5 ) < path ( , V +5 ), respectively, they may become V +5 's potential parents. Then, we select a node whose new path load is smaller from V and V +2 as V +5 's ultimate parent according to formula (20).

Details of the Algorithm.
The core of our algorithm is shown in Algorithm 1. In our algorithm, V counts how many times node V is selected as the node with the highest load in tree , and (V ) denotes switching probability of node V . The initial values of V and (V ) for each V ∈ are set to 0 and 1/2, respectively. In line (2), V denotes the node with the highest load in tree , whose children nodes will be selected for switching. max denotes the maximum allowed number of times that a node can be selected for switching. It is an important parameter that affects the convergence of the algorithm. If the node V with the highest load has been selected for max times, the switching procedure will be terminated (line (3)) and the data-gathering tree is returned (line (27)). Within the while loop, if the current spanning tree satisfies inequality (16), the iterative operations of the algorithm will be terminated, and the data-gathering tree is returned (lines (4)-(5)). Otherwise, the children of V are inserted in queue , and each node in queue is qualified to be selected to switch. If queue is not empty, we remove a node V from queue in FIFO order and then determine V 's potential parents set V using formula (18) (lines (9)-(10)). If node V has no potential parent, the children nodes Ch(V ) of V are added to the queue and considered for switching in subsequent rounds (line (12)). If V has some potential 8 International Journal of Distributed Sensor Networks parents, a random decision is made based on its current switching probability (V ) through the switching decision function. If the outcome of the decision is to switch, a node Par V is chosen as the parent of node V through formula (20), and then the tree is updated by switching the subtree V rooted at node V to Par V . The new loads path ( , V ) and node ( , V ) of each V ∈ are updated (lines (14)-(16)). Because node V is switched, the subsequent switching probability becomes a half of its original; that is, (V ) = (V )/2 (line (17)). Since node V is switched and the tree is updated, we should terminate the while loop (line (18)) and then reselect a new node V with the highest load (line (25)) for subsequent iterations. If the outcome of the decision is not to switch, node V remains with its current parent and the descendants of V are added to the queue for subsequent consideration (lines (19)- (20)).

Analysis of Algorithm.
In this section, we present the time complexity of our algorithm. The calculation of path ( , V ) and node ( , V ) for each V ∈ that is done during the initialization can be performed in ( ) time. Lines (1)-(2) of our algorithm can be performed in ( ) time. On line (4), since there can be at most ( ) leaf nodes in current spanning tree, the calculation of the maximum and minimum path load of leaf node takes at most ( ) time. The initialization of on line (7) takes ( max ) time, where max is the maximum number of children of any node. On line (10), we determine the potential parents of the selected switching node V . Because each neighbor of the switching node V has the potential to become the parent, we need to calculate V 's new path load V path ( , V ) when V switch to each neighbor. The calculations take ( max ) times, where max is the maximum number of neighbors of a node in graph . In order to determine V path ( , V ), we need to calculate the load of each node on the path from the selected switching node to the sink. It takes at most ( ) time where is the diameter of the network. Therefore, determining the potential parents on line (10) takes ( max ) time. Updating on lines (12) and (20) takes ( max ) time. Selecting the ultimate parent on line (15) takes at most ( max ) since the new path load has been calculated on line (10). On line (16), updating path ( , V) and node ( , V) takes ( ) time. Since max ≤ and max ≤ , the running time for operations from line (9) to line (22) is dominated by ( max + ). Because the subtree V rooted at node V can have at most descendants, the second loop on line (8) can run for ( ) times. Therefore, the running time for the second loop takes ( ( max + )) time. If all nodes periodically participate in switching, the loop on line (3) runs at most max times. Hence, the total time complexity of our algorithm is ( 2 ( max + )).

Performance Evaluation
In this section, we evaluate the performance of our algorithm through simulation experiments.

Simulation Setup.
We evaluate the performance of our algorithm using Matlab simulations. In our simulation, we randomly deploy sensor nodes in a square region of size 100 m × 100 m. The number of sensors is varied from 100 to 300. We consider three scenarios associated with the different location of the sink in the network. In Scenario 1, the sink was placed at the corner of the deployment area, that is, at the (0 m, 0 m) coordinate. In Scenario 2, the sink was placed at one side of the deployment area, specifically at the (50 m, 0 m) coordinate. In Scenario 3, the sink was placed in the middle of the deployment area, namely, at the (50 m, 50 m) coordinate. Each sensor is randomly assigned an initial energy between 0.5 and 1 joule (J). We assumed that all nodes have the same energy consumption model. The radio transmission range of each node is set to 25 m. Similar to [18], the energy required to receive data is 50 nJ/bit and the energy required to transmit data is 100 nJ/bit. The size of one-unit data is 16 bytes. The energy consumption for one-unit data transmission and reception is 6400 nJ and 12800 nJ, respectively; that is Rx = 6400 nJ and Tx = 12800 nJ. We generate 100 random networks and present the average results for performing comparisons. We evaluate the lifetime performance measured as the number of datagathering rounds until the first node runs out of energy.
The choice of the terminate parameter governs convergence speed of the algorithm and lifetime of the tree produced at converged point. However, it is not possible to know the optimal value of for a random connectivity graph. Intuitively, it may appear that choosing as low as possible results in a higher lifetime tree. When is too low, the algorithm will perform more switching to find a spanning tree to meet the terminate parameter. However, such spanning tree may not topologically exist in a given connectivity graph representing the sensor network. The oscillations may occur and diminish the lifetime of the final tree. Therefore, we need to learn the appropriate ranges of for different network configurations. To solve this problem, we generate 100 random deployed networks for each scale sensor network as training inputs to our algorithm. Then, we measure their lifetimes with varying values and select the optimal value for simulation experiments. The maximum allowed switching time max is set to 1000.

Simulation Results.
In this section, we first compare our algorithm with the MITT [18], RaSMaLai [19], and MECDA [14] data-gathering schemes for Scenario 2. The lifetime performance is compared as a function of the number of nodes. The sampling rate is set to 10% and 20%.
It can be seen from Figure 4 that our algorithm outperforms the compared algorithms. Furthermore, the lower the sampling rate, the better the lifetime performance of our algorithm. When the sampling rate is low, more nodes in the network can become the aggregating nodes, so as to save energy. The MITT algorithm is based on a min-max-weight spanning tree. The algorithm improves the lifetime of a given tree by switching the parent of the node under consideration. The MECDA algorithm uses hybrid-CS technology, and the objective of the algorithm is to minimize the total energy consumption in a data-gathering tree. It does not consider the balance of energy consumption and energy consumption of each node. The nodes near the sink may have higher energy consumption, which affects the lifetime of the network. The RaSMaLai algorithm uses randomized switching scheme to maximize the lifetime of data collection trees based on the concept of the bounded balanced trees. However, because the algorithm collects the raw data directly, the nodes close to the sink need to relay data of all the downstream nodes on the path, so that the energy consumption is higher, and the network lifetime is shorter. Our algorithm uses both the hybrid-CS strategy and the random switching method to adjust the load of bottleneck node, thus obtaining the best lifetime performance. Figure 5 shows the lifetime of different algorithms when the sampling rate is varied from 10% to 30%, and the number of nodes is 200 and 300, respectively. As shown in Figure 5, with the increase of the number of nodes, the nodes near the sink need to forward more data, the energy consumption increases, and the lifetime is reduced. With the increase of sampling rate, the lifetime for our algorithm and MECDA decreases. This is because when the sampling rate increases, the maximum data length is increased, and the intermediate nodes need to gather more data to become aggregating nodes, the energy consumption of such nodes increases, and the network lifetime is reduced. However, the lifetime of RaSMaLai does not change with the varying of sampling rate, because it does not use the CS strategy. When the sampling rate is over 30%, the performance of our algorithm is more and more close to that of the RaSMaLai algorithm. The reason is that when the sampling rate is large, few nodes can become the aggregating node, and the characteristic of energy consumption is consistent with the RaSMaLai algorithm.
As shown in Figure 6, we compare our algorithm with MECDA in different scenarios. No matter which scenario   is used, our algorithm outperforms the MECDA algorithm. In Scenario 1, the performance of our algorithm is more superior, because when the sink is placed at (0, 0) coordinate, the nodes around the sink need to forward more data of downstream nodes and have heavy loads, and thus our algorithm can make the load of such nodes become more balanced by switching. However, in Scenario 3, the performance improvement of our algorithm is relatively small; this happens since the sink located at the middle naturally gives better balancing for each algorithm.

Conclusion
In this paper, we have studied the problem of constructing a maximum-lifetime data-gathering tree. We used hybrid-CS theory to data gathering in wireless sensor networks. We first constructed an arbitrary data-gathering tree, used a random switching decision to select a bottleneck node's child as switching node, and then designed an optimal parent node selecting strategy for assigning an optimal parent to the selected switching node. We kept transferring descendants of the bottleneck node to realize load balancing until the terminate parameter or maximum allowed switching time is satisfied. Simulation results show that the proposed algorithms can significantly increase the lifetime of WSNs and outperform several existing approaches in terms of network lifetime.