Data Collection for Time-Critical Applications in the Low-Duty-Cycle Wireless Sensor Networks

In low-duty-cycle wireless sensor networks, wireless nodes usually have two states: active state and dormant state. The necessary condition for a successful wireless transmission is that both the sender and the receiver are awake. In this paper, we study the problem: How fast can raw data be collected from all source nodes to a sink in low-duty-cycle WSNs with general topology? Both the lower and upper tight bounds are given for this problem. We use TDMA scheduling on the same frequency channel and present centralized and distributed fast data collection algorithms to find an optimal solution in polynomial time when no interfering links happen. If interfering links happen, multichannel scheduling is introduced to eliminate them. We next propose a novel Receiver-based Channel and Time Scheduling (RCTS) algorithm to obtain the optimal solution. Based on real trace, extensive simulations are conducted and the results show that the proposed RCTS algorithm is significantly more efficient than the link schedule on one channel and achieves the lower bound. We also evaluate the proposed data collection algorithms and find that RCTS is time-efficient and suffices to eliminate most of the interference in both indoor and outdoor environment for moderate size networks.


Introduction
In most wireless sensor networks (WSNs), how to improve the network robustness for data collection is an important issue. Two seemingly contradictory, yet related objectives of performance exist: prolonging the network lifetime due to limited power resources and reducing data latency. On the one hand, researchers have put much effort to save or harvest energy for wireless sensor nodes in order to prolong the lifetime of individual nodes since most wireless sensor nodes are powered by batteries. On the other hand, some application scenarios like surveillance [1] require a bound on the data collection latency to guarantee availability of accurate information about the sensing field; otherwise, collected data information may become irrelevant or even useless [2,3]. In many emergency applications, the collected sensor data are usually useful during a finite amount of time, after which the information may become irrelevant. The requirement is special in some time-critical application scenarios using WSNs, such as the application of monitoring the escapees' current locations and expanding situation of hazard areas in the emergency navigation system [2]. In network diagnosis, it is necessary to collect metrics periodically, such as residual energy levels of nodes and instant network conditions, which are required to be collected in real time [3]. In the above situations, sensor data and metric information are both asked to be collected from nodes to the sink as soon as possible. Hence, the maximum end-to-end delay for each bit must be limited within some acceptable time-efficient requirement.
One of simple but efficient methods to prolong the lifetime of WSNs is to decrease the duty cycle of individual nodes [4][5][6] such that they finish coping with sensing and transmission operations when they are awake while falling into sleep for the rest of time in a time period. Hence, the working mode of sensor nodes is normally designed as periodic active in most monitoring applications [7,8]. However, this brings forth another challenge for minimizing end-to-end delay since any individual node existing on some end-to-end transmission path makes the path unfeasible if it goes into dormant state such that the end-to-end delay cannot be controlled easily. In addition, the low-duty-cycle mechanism causes more severe collision because the available 2 International Journal of Distributed Sensor Networks Input: The given VGN over the time expanded. Output: The optimal data collection paths with the minimum delay.
(1) Initialization: Δ = max −1 =1 { min }, = 0, = ⌊Δ/ ⌋ (2) while < − 1 do (3) Update the sub-graph of VGN under the time threshold . (4) while there is an -path in the residual graph (Δ) do (5) be a simple -path in (Δ) (6) = augment( , ) (7) if satisfies the constraints in the above max-flow problem then (8) Update to be (9) Update the residual graph (Δ) to be (Δ) (10) Obtain the maximum flow of ( , , , , Δ) and optimal flow paths . (11) = + 1 (12) Among the , through flow in the th period, find the the largest of , . (13) min = the largest . time for nodes to receive packets is notably reduced [6]. Several attempts have been applied to decrease the end-toend latency for WSNs. For instance, [9] utilized collisionavoidance method to reduce interference among wireless transmission since the latter causes much retransmission due to packet loss, especially when a contention-based MAC protocol (e.g., CSMA) is explored. Some other work [10] also proved that one of the main reasons of traffic latency is the "wasting" sleeping time of the receiver.
In this paper, we focus on how to design Fast Data Collection algorithms to reach the minimum data collection delay in low-duty-cycle WSNs. We apply contention-free MAC protocols, that is, TDMA, to eliminate collisions and make the best of the limited link resources in compliance with node working schedules. In addition, since the network protocols can affect the end-to-end delay of each bit [10], we use a joint routing and scheduling cross-layer design to search the minimum data collection delay paths. To address fast data collection problem, we present both centralized and Distributed Fast Data Collection algorithms when no interfering links happen. If the interfering links happen, the multichannel scheduling is introduced to eliminate interfering links. Furthermore, a simple but efficient distributed channel and time scheduling method is proposed to search the minimum data collection delay paths. The main intellectual contributions of this work are summarized as follows.
(1) For the raw data collection, we define a minimum data collection delay (MDCD) problem and give both the upper and lower tight bounds on the data collection delay.
(2) We present a novel Centralized Fast Data Collection (CFDC) algorithm (Algorithm 1), adopted to any general topology, which achieves the lower bound and returns the optimal collection paths with the minimum delay in polynomial time.
(3) We further propose a Distributed Fast Data Collection (DFDC) algorithm based on the node's local information. It is proved that the DFDC algorithm can also obtain the optimal solution.
(4) In order to eliminate the interfering links and guarantee network robustness, we present a Receiver-based Channel and Time Scheduling (RCTS) algorithm to the search for minimum data collection delay paths. The data trace based evaluations show the use of RCTS can suffice to eliminate most of the interference in both the indoor and outdoor environment.
The rest of this paper is organized as follows. Section 2 briefly presents the related work. We present the network model and formulate our problem formally in Section 3. In Section 4, both the upper and lower bounds are given on the MDCD problem. We propose a CFDC algorithm to solve the fast data collection problem when no interfering links happen, which obtains the optimal solution. In Section 5, a DFDC algorithm is further presented when no interfering links happen. When interfering links happen, combining with multiple channel scheduling, a RCTS algorithm is proposed to search for the minimum data collection delay paths in Section 6. Based on the data trace from the Berkeley Lab, extensive simulations are conducted in outdoor and indoor scenarios and further present and analyze the results in detail in Section 7. We conclude our paper and discuss future work in Section 8.

Related Work
The contribution of our work lies in the intersection of two important cutting-edge research topics: (1) the function performance of data collection and (2) the low-duty-cycle working modes of sensor nodes. We review some related work in both topics as follows.
Data Collection. The objective of most WSN-related applications is to collect data from physical world, such as environmental surveillance [8,11] and vehicle and structural monitoring [12][13][14][15]. Some work [9,16,17] concentrated on how to improve the energy efficiency, time efficiency, and (or) reliability of deployed WSNs. In [16], the authors presented a practical, energy efficient, and reliable solution to the problem of periodic data collection. Reference [9] proposed a link scheduling algorithm finding the minimum delay schedule when the slot lengths of link are given. Basically, they investigated the tradeoff between the energy consumption and delay. By developing a novel and efficient TDMA schedule, [17] studied the effect of dynamic traffic patterns for data collection. However, for all the aforementioned work, the authors assumed that the data collection was applied in the case with constant availability of connectivity such that the proposed algorithms may lead to very low efficiency in low-duty-cycle sensor networks with sleep latency.
Low-Duty-Cycle. One of simple but efficient methods to prolong the lifetime of WSNs is to decrease the duty cycle of individual nodes such that they finish coping with sensing and transmission operations when they are awake while falling into sleep for the rest of time in a time period since a node almost costs no energy in dormant state. Some other work studied the low-duty-cycle related topics from different points of view, such as data flooding and broadcasting [10,18], sleep scheduling [5,19], and multiple tasks scheduling [6]. In [10], the authors detailed an opportunistic flooding method in a low-duty-cycle WSN with unreliable links when the working schedules of all wireless sensor nodes were given. Reference [18] aimed to solve the latencyoptimal broadcasting problem in the duty-cycled multihop wireless networks and proposed three algorithms to improve the approximation ratios. They also proposed a forwarder selection method to alleviate the hidden terminal problem. The work [19] modified the random duty cycling to solve slow diffusion problem. Reference [5] introduced energyefficient sleep scheduling in order to minimize latency. The authors of [6] provided two efficient scheduling algorithms in order to balance network traffic. However, most of above work regarding low-duty-cycle WSNs does not concern data collection function.
Some work, like [4,20], has similar objectives with our problem. In [4], the authors presented a dynamic data forwarding (DSF) scheme in extremely low-duty-cycle sensor networks with unreliable links. They studied the impact of both lossy radio link and sleep latency at the network layer. Although the DSF scheme can be used to data collection scenario in low-duty-cycle networks, it adopted some multiple paths routing strategy which suffered packet duplicated issue inevitably. In addition, exchanging traffic statistic frequently for routing cutover may introduce nonnegligible communication overhead. Reference [20] provided cross-layer analysis framework for end-to-end delay distribution in WSNs. Some MAC protocols, such as B-MAC [21], exploited network information like packet transmission or routing paths to optimize their link scheduling. Reference [4] considered the MAC layer information (like link quality) into the network layer, which leads to better link selections.
For the interference elimination, the use of multiple channels has been studied extensively in both cellular and ad hoc networks. However, in the domain of wireless sensor networks, there exist some research that utilize multiple channels [22].
To the best of our knowledge, there is no prior work that has thoroughly researched on the optimal delay routing algorithm for the data collection scenario in low-duty-cycle sensor networks without topology constraints. In this work, by taking the available time of link resources (working schedules) into consideration, we reveal that the cross-layer design is to obtain a delay-optimal data collection path for each wireless sensor node and exploit multiple-channel scheduling to eliminate the interfering links.

Network Model and Problem Formulation
In this section, we first focus on a TDMA scheduling crosslayer design when the nodes communicate on the same channel without any interfering link present. In the following Section 6, the case where interfering links happen is considered. We will combine with the multichannel scheduling to eliminate the interfering links without compromising the time efficiency of data collection.

Network Model.
In this work, it is considered that a connected network with wireless sensor nodes works in a low-duty-cycle mode. Basically, each sensor node has two states: active and dormant. When a node is in active state, it can sense or receive packets from neighbor nodes. For the purpose of energy conservation, a node in dormant turns off all its functional models except a timer to wake itself up. Under the low-duty-cycle working mode, if a node only delivers packets in active state, the packets will hardly be sent out since the active time may not overlap with that of its neighbors. To solve this problem, each node still receive packets when it is in active state, but the transmission rule is changed as follows: if a node has packets to forward beyond its active time slots, it wakes up its transceiver and transmits packets when its next hop neighbor turns into active state [23].
To simplify the discussion, it is assumed that all nodes in the network work in the asynchronous duty-cycle mode and are sources except the sink. The objective is to minimize the whole time required to complete data collection. In TDMA MAC protocol, is divided into a number of equal-length time slots , each of which is long enough to transmit one packet successfully. We also assume that each node wakes up to receive the packet for only one time slot in every cycle, which is reasonable since the function of low-duty-cycle aims to reduce the waking time of individual node [24].
To symbolize the working schedule more clearly, we describe it as a binary string for individual node, in which 1 means the node in active state, while 0 indicates that in dormant state [4]. The cycle period of working schedule is denoted as . For example, the working schedule of node can be extended to = 0100010001, if the network collecting time is 10 time slots, = 0100 and is 4 time slots. The above assumption is the same as the previous work [4,10,23].
There are two common types of interference models: the graph based protocol model and the SINR-based physical model. In the protocol model, the transmission from node to a node is successful, if for every other , simultaneously transmitting, the following condition holds: where ( , ) is the distance between node and node and is a parameter that promise that the concurrently transmitting nodes are enough far away from the receiver to prevent interference. The physical model considers the accumulative effect from multiple concurrent transmissions. The transmission of a packet from node to node is successful when the ratio between the received signal strength at node and the cumulative interference caused by all other concurrent transmissions and the background noise is greater than a certain threshold ; that is, where is the transmission power at node , SINR indicates the signal-to-interference-plus-noise ratio, N is the background noise level, (⋅) is the signal attenuation function, and is the distance between nodes and . Since all nodes use the same constant transmission power, we set = = 1. For simplicity of exposition, we use a simple distance dependent path-loss model as = − , where the path-loss exponent is a factor between 2 and 6, depending on the external wireless environment, such as humidity, temperature, and obstacles. It is assumed that the interference level in both models is static and does not change over time.
Reference [25] found that the use of the graph based model fails most in sparse network deployments with higher path-loss exponents, and the physical interference model is closer to the actual situation than the protocol model. Hence, we use the physical model in all the following evaluation cases.

Problem Formulation.
We define the MDCD problem as follows. Given a WSN with sensor nodes, if the sink node is the th node, the other − 1 nodes are source nodes, each of which generates one packet. Collection delay is defined as the time elapsed from packets being generated until all − 1 packets reach the sink. The goal is to find data collection paths to the sink from each source with the minimum collection delay. Symbol EED(V ) is used to denote the end-to-end delay (EED) for source node V to the sink within either one or multiple hops. The objective is to minimize = max −1 =1 EED(V ), ∀V ∈ . Formally, let ( , , ) be a 1 − 0 integer variable indicating whether the node V at time receives data from V . indicates the active time of node V and ( , , ) is also a Boolean variable denoting whether the packet is delivered by node V at time . Thus, the MDCD problem can be described as follows.

Problem: Minimizing Data Collection Delay
Objective: Minimize T Subject to: Here, inequality (3) ensures that each node in active time can only receive one packet from its neighbor node and cannot receive or send data simultaneously. The links in Figure 1(a) can not be scheduled simultaneously. Inequality (4) guarantees that the same packet can only be delivered once by each node, which can restrict routing loops in the network. Equation (5) restricts the ability of each node to receive data when it is in dormant state.
In terms of collision type, the links for interference are divided into two categories: intersecting links and interfering International Journal of Distributed Sensor Networks 5 links [25]. The intersecting links, defined as the links with a common destination shown in Figure 1(b), cannot transmit on the same time slot since there is only one half-duplex radio transmitter in a sensor node. The constraint described by inequality (3) ensures that the intersecting links will not exist in the data collection paths. The interfering links are the links which create or face interference if they are scheduled simultaneously, which happens when two nodes send data simultaneously, and the SINR of any receiver is not greater than the predefined threshold. Figure 1(c) shows an example where the dotted line represents interference. Interfering links should not get the same time slot and channel. Since our goal is to minimize the number of time slots, the best option is to assign the same time slot on nonconflicting channels. Hence, the interfering links can be avoided to exploit the orthogonal frequencies [24,25], that is, using multiple frequency channels to enable more concurrent transmissions. In the following study cases, we first consider MDCD problem in a case where nodes communicate on the same channel with the goal of minimizing the data collection delay. Next, we combine with the multichannel scheduling to avoid the interfering links.

Centralized Fast Data Collection Algorithm
In this section the goal is to address the MDCD problem. We first analyze the lower and upper bounds on the delay for data collection and give the procedure of proofs. Furthermore, we introduce a novel concept, the VGN to convert the MDCD problem into max-flow problem with special constraints. Based on VGN, we propose a Centralized Fast Data Collection algorithm (CFDC) to obtain the optimal collection paths with minimum delay in polynomial time when no interfering links happen. Proof. The detailed proof can be seen in our previous work [26].

Lemma 2.
The delay for data collection is tightly upper bounded by ∑ =1 min , where is the number of source nodes and min denotes the minimum time that source node requires to deliver a packet to the sink without the effect of other packets.
Proof. The detailed proof can also be seen in our previous work [26].

Virtual Grid Network.
To make the node states and packet transmission process more clear, we define and detail a concept, VGN inspired by the time-expanded network proposed in [4]. The edge construction regulations of VGN are different from the time-expanded network. We first show how to construct the VGN based on the original directed communication graph , which is the foundation of the data collection method proposed later.
For simple presentation, we first classify the nodes in into three types by different roles.
(i) Leaf node: only acts as a source node; that is, it transmits its own sensor reading.
(ii) Intermediate node: acts as double roles of a source node and a relay node; that is, not only does it transmit its sensor reading, but also receives a packet and tries to forward it when its neighbors awake. The own generated packet has higher priority to transmit than the forwarded packets.
(iii) Sink node: responsible for receiving packets.
A deployed WSN is transferred into a communication graph = ( , ), where denotes the node set with cardinality and is the edge set. Assume that the working schedule for each wireless node ∈ is given; we build a VGN according to the following rules.
(1) For each node active at time ∈ [0, ], we build a virtual active node , in VGN.
(2) When node is a leaf node and has a directed edge to the node in , if the first active time of is and the active time of node is after time , we add a directed edge from , to , in VGN. Since the leaf node only transmits its own sensor reading, it is not possible for to have new arrived packet at the other active time except the first one.
(3) When node is an intermediate node and has a directed edge to the node in , if the active time of node and are and ( , ∈ [0, ]), respectively, and > , we add a directed edge from , to , in VGN.
For the raw data collection, the MDCD problem can obtain an optimal result by reducing into the max-flow problem. In order to formulate a max-flow problem over the VGN, we introduce two virtual vertices, the super source and the super sink, symbolized by and , respectively, to represent the source and destination of the total flow over the graph. We complement the rules to build the connection from and to the virtual active nodes.
(4) When node is a sink, we connect all its corresponding virtual active nodes to the super sink .
(5) We build the edges from the super source to the corresponding first active virtual nodes of all source nodes.
Here, the first rule illustrates the regulations to establish all nodes of VGN. The remainder rules depict how to build edges to connect nodes in VGN according to . We take the simple network in the best case mentioned in Figure 2 as an example to give a walk-through of VGN construction (shown in Figure 4).
In the following, we detail how to construct VGN graph in terms of original network topology and working schedule of each node. First, we set the nodes in VGN based on the working schedule of each node in by Rule 1 and the red nodes in VGN represent the virtual active nodes. Next, we describe how to construct corresponding edges in VGN according to Rules 2-5.
For the leaf node V 1 , it can send a packet to its neighbor V 3 when V 3 is active. Thus, we establish the edges of 1,1 → 3,2 , 1,1 → 3,6 , and 1,1 → 3,10 in VGN by Rule 2. For another leaf node V 2 , it can also send a packet to its neighbor V 3 when V 3 is active. According to the same rule, we build the edges of 2,4 → 3,6 and 2,4 → 3,10 in VGN.
For the sink V 4 , we build the edges from the corresponding virtual active nodes to the super sink by Rule 4, that is, 4,3 → , 4,7 → , and 4,11 → . Finally, we build the connection from the super source to the corresponding first active virtual nodes of all source nodes by Rule 5, that is, → 1,1 , → 2,4 , and → 3,2 . The whole mapping process for one of the best cases is completed, as shown in Figure 4.

The Max-Flow Problem with Transmission Constraints.
Given the VGN, our next step is the formulation of an optimization problem whose objective is to maximize the flow from to , that is, the most number of source nodes from which the sink can collect data successfully under the given time duration . ( * , * ) indicates the binary traffic flow over an edge connecting two vertices in VGN. Denoting by ( * , * ) the total flow from the source to the destination, our objective can be described as The data collection delay is defined as the total time required to complete data collection, which depends on the maximum EED for each source node. Minimizing the maximum EED problem is equal to the max-flow problem in VGN, which needs to be solved taking into account several constraints due to, for example, interference influence and half-duplex transceiver limitation. We detail such constraints below.

Constraints
Nonnegative Flow and Flow Conservation. The flow on every existing edge must be greater than or equal to zero. In the meantime, for any vertex in the VGN, the amount of flow entering the vertex must be equal to the amount of outgoing flow. Mapping to the VGN, the constraint can be expressed as The function ( ) indicates the set of node 's neighbors in the original network .
Half-Duplex Transceiver Limitation. Due to the hardware function limitation, a node cannot transmit and receive packets simultaneously shown in Figure 1(a). Mapping to the VGN, the constraint can be expressed as where the binary function ( * , * ) is equal to 1 if the edge carries a flow; otherwise it is 0. We notice that the case shown in Figure 1(a) happens only when and are the neighbors and active at the same time in the original topology. Meanwhile, a node cannot receive from more than one neighbor at the same time, shown in Figure 1(b). For this constraint, we set the node capacity in VGN to one unit, that is, , = 1. We can deduce that the edge capacity in VGN satisfies the conditions below: Since we assume that the interfering links are eliminated, the max-flow problem is completed to formulate.

Minimum Data Collection Delay
Algorithm. After converting the MDCD problem into a max-flow problem, we present a novel CFDC algorithm inspired by the Ford-Fulkerson max-flow method to solve the MDCD problem and obtain the optimal solution in polynomial time.
In CFDC algorithm, the working schedule for each node needs to be known by the sink in the network initialization, which can be easily achieved through exchange of the hello messages. The initial expanding time of VGN is set to be the maximum value of the set of minimum time for each packet delivered to the sink by Dijkstra Algorithm. The maximum flow is set to zero at the initialization step. Each iteration of the outmost while-loop corresponds to one phase. In each phase, the time threshold is monotonously increased with the step of cycle period . The number of phases stops to increase until the maximum flow satisfied by all constraints is equal to the number of source nodes. This is an indication that all packets can be scheduled within without collision. Obviously, the result is the optimal collection paths for all source nodes. Proof. The max-flow problem is to find the max flow from the source to the destination . In the VGN, the neighbors of super sink come from the corresponding virtual active nodes of the sink. Therefore, in order to make the flow maximum from to , the corresponding virtual active nodes of the sink try to carry flow as much as possible. When all corresponding virtual active nodes of the sink carry flow, the sink keeps receiving packets in all active time. At this moment, CFDC algorithm returns the optimal collection path with the minimum delay, that is, (sink) + ( − 1) . In the best case illustrated above, the optimal collection paths returned by CFDC algorithm are shown in bold in Figure 4, and the minimum collection delay achieves the lower bound. Thus, the CFDC algorithm can obtain the optimal collection paths with the minimum delay. In addition, due to the half-duplex transceiver constraints, it sometimes fails to make all corresponding virtual active nodes of the sink carried flow, shown in the worst case mentioned above (Figure 3). But it can also obtain the optimal solution. Figure 5 shows the optimal collection paths returned by CFDC algorithm. However, the returned minimum delay will not achieve the lower bound, since it applies in the best case.

Lemma 4. The total time complexity for CFDC algorithm is ( 3 3 ). Here, is the minimum time threshold when the maximum flow is equal to the number of source nodes.
Proof. The detailed proof can be seen in [26].

Distributed Fast Data Collection Algorithm
Since the CFDC needs to obtain all nodes cycling information, it is not practical to implement the CFDC in the large (1) source-node.buffer = full (2) Compute ( ) for each node ∈ (3) Sort nodes in the ascending order of ( ) (4) while Any node's buffer is full do (5) for Each node in the sorted order of do (6) if node.buffer == then (7) for Each node ∈ 's neighbors who are active do (8) if ( .neighbor)< ( ) and .neighbor.buffer == then (9) .neighbor is node 's next hop (10) .neighbor.buffer = (11) .buffer = Algorithm 2: Distributed Fast Data Collection algorithm. scale networks. In this section, we present the distributed version of our Fast Data Collection algorithm. We first analyze the generic principle for link scheduling and then we propose a Distributed Fast Data Collection (DFDC) algorithm to reduce the beacon overhead. Finally, we present the theoretical analysis of DFDC.

Design Philosophy. Such CFDC is a link-based method
and is difficult to be performed in a distributed environment. This is because it needs the information of all potential links to construct VGN, which brings out large overheads. We aim to design a node-based method to solve MDCD problem, by which the node make a decision by itself. That means the node decides locally what its next hop is, and when the data is delivered. The main idea are as follows: (1) Keep the sink busy in receiving packets for as many active time slots as possible. Because the sink has duty cycle, we need to guarantee the nodes within one hop to the sink always waiting for sending packets. (2) Give higher priority to the node to send data that is nearer to the sink. The number of hops from node to the sink is represented as the distance to the sink by Dijkstra shortest path algorithm.
Definition 5. The pressure index of node ∈ , denoted by ( ), is the minimum number of hops from node to the sink by Dijkstra shortest path algorithm, which represents the distance to the sink.
Since the nodes far away from the sink are more difficult to send packets to the sink, they have higher pressure to deliver their packets.

Algorithm Description. The formal description of DFDC is shown in Algorithm 2.
Each source node keeps a buffer and its corresponding state, which is logical, either full or empty. The proposed DFDC algorithm does not require large buffers, because it is guaranteed that at any time the buffer will store not more than one packet. We initialize that all source nodes' buffers are full and keep the sink's buffer always empty for ease of explanation.
We first sort all nodes based on the pressure index. For each time slot, packets then go through each node ∈ in the sorted order and are pushed from nodes with higher ( ) to ones with lower ( ) step by step, until all packets are delivered to the sink. When node decides its next hop, it will communicate with its neighbors and find the ones that satisfy two constraints: (1) the neighbors' ( ) is lower than the current node's; (2) the neighbors' buffers are empty. For the result of finding conditional neighbors, there are three cases: (1) There is only one neighbor meeting both of constraints and then this neighbor is selected as the next hop. (2) There is more than one neighbor satisfying the above two conditions and then we choose one of them at random as the next hop.
(3) There are no conditional neighbors; then the node will wait for another time slot until it has conditional neighbors to send its packet. Figures 6(a) and 6(b), we show the results of link scheduling in one of the best cases and one of the worst cases. Both of the original networks shown in Figure 6 contain four source nodes. The solid lines represent potential links when the receivers are active. The numbers beside the links represent the time slots at which the links are scheduled to send packets, and the numbers inside the circles denote nodes' IDs.  Figure 6: Link schedule using DFDC algorithm: (a) One of the best cases. (b) One of the worst cases. We run through an example shown in Figure 6(a) to explain the DFDC algorithm. We first compute the pressure index of each node and obtain that (1) = (2) = 2, (3) = (4) = 1, and (5) = 0. In the first duty cycle, since only the sink (node 5)'s buffer is empty and (5) is smaller than (4), we schedule the link (4, 5) when node 5 is active. Thus, the sink receives a packet from node 4 in slot 3. In the second duty cycle, the buffers of node 3 and node 5 are empty and (3) < (1), (5) < (4); thus we schedule the links (1, 3) and (4,5) in slots 6 and 7, respectively. Then the buffers of nodes 1 and 4 are empty. In the third duty cycle, the node 2's buffer is full and one of its neighbors, node 4's buffer, is empty. Thus we schedule the link (2, 4) in slot 9. For the same reason, we schedule the link (3,5) in slot 11. Hence, the buffers of nodes 2 and 3 are empty. In the fourth duty cycle, only node 4's buffer is full. Hence we schedule the link (4, 5) in slot 15. This process continues until all the packets are delivered to the sink, yielding a link schedule that requires 15 time slots. In Figure 6(b), we show an assignment in one of the worst cases when all the interfering links are eliminated, yielding a schedule length of 27 time slots.

Algorithm Analysis.
In the following, we prove that the DFDC algorithm can obtain the optimal solution when no interfering links happen. Before giving the detailed proof, we first highlight the two main principles of the algorithm: (1) The sink is kept in receiving packets for as many time slots as possible. (2) A node's buffer is not empty for two or more consecutive duty cycles as long as the buffers of one or more nodes with higher ( ) are full. The first one is easy to prove by the scheduling rules of DFDC algorithm. We prove the second one in the following lemma. Proof. We prove it by induction on time slot = + 1. We focus on the first slot of each duty cycle. At = 1, the theorem is intuitively true because all buffers of the source nodes are full. Suppose the lemma holds for = + 1; that is, for slot = + 1 the buffers of all nodes with ( ) = 1 are empty, and for slots = ( − 1) + 1 and = ( + 1) + 1 there exists at least one node with ( ) = 1 whose buffer is full. At = ( +1) +1, one node ( ) = 1 whose buffer is full sends a packet to the sink, and its buffer becomes empty. There exist the following two cases: (1) all nodes with ( ) = 1 whose buffers are empty and at least one node with ( ) = 1 whose buffer is full at the following slot = ( + 2) + 1. (2) another node with ( ) = 1 whose buffer is full, which is noted as node .
The first case means that, for slot = + 1, there is only one node with ( ) = 1, whose buffer is full. Since there is one or more nodes with ( ) > 1 in full buffers, during the next duty cycle, the nodes with ( ) = 1 will receive at least one packet.
The second case means that, for slot = + 1, there are more than one node with ( ) = 1, whose buffers are full. For slot = ( + 1) + 1, one node with ( ) = 1 whose buffer is full sends a packet to the sink. Since the sink only receives one packet during each duty cycle, there is another node with ( ) = 1, whose buffer is full. Therefore, the lemma holds for = ( + 1) + 1, and the proof follows.
Theorem 7. DFDC algorithm can obtain the optimal data collection paths with the minimum delay, that is, ( )+( − 1) .
Proof. The principle of DFDC algorithm is to keep the nodes with ( ) = 1 always waiting for sending a packet. From Lemma 6, we know that the sink keeps receiving a packet during two consecutive duty cycles. Therefore, the DFDC algorithm can obtain the optimal data collection paths with the minimum delay. Note that the lower bound of the best case illustrated in Figure 6 is (sink) + ( − 1) = 3 + 3 × 4 = 15, the same as the result of the proposed DFCD algorithm.

Impact of Interference
So far, we consider the scheduling methods which assign the same channel to all the receivers. Since all communications on the same channel can not avoid the interfering links, the collision will happen if the SINR value at any receiver is not greater than the predefined threshold, especially when two or more transmissions are launched at the same time slot. In this section, we combine the channel with time scheduling to eliminate the interfering links. As shown in Figure 6(b), links (4, 5) and (1, 3) at slot 11 as well as links (4, 5) and (2, 3) at slot 19 are interfering links. If the interfering links are scheduled at the same channel, it will result in serious data loss. We give the experimental results to show how serious it is when the density is on the rise. In the 100 m square field, varied number of nodes are deployed randomly and we evaluate the delivery ratio along with the increase of node density. From Figure 7, we observe that the delivery ratio deteriorates when all communications are scheduled on the same channel, especially in high network density. The main reason is that interfering links cause massive collisions. The easy way to avoid the interfering links is to assign extra time slots, however it results in the large delay of data collection. If all the interfering links are present, the total schedule length will extend to 35 slots for the case in Figure 6(b). Without compromising the time efficiency of data collection, we use multiple frequency channels to avoid the interfering links.
We propose a simple Receiver-based Channel and Time Scheduling (RCTS) algorithm which assigns the time slots and channels based on DFDC algorithm. Since the data transmission depends on the time slot when the receiver wakes up, we first use DFDC algorithm to assign links on the same channel. Next, for receivers working at the same slot, it is checked whether any interfered receiver based on SINR thresholds exists and assigns the next available channel iteratively starting from the interfered receiver with the lowest ( ). Note that we assume there are adequate channels available to be allocated to the interfered receivers.

Performance Evaluation
In this section, we evaluate the impact of duty cycle, multiple channels for both the outdoor and indoor environment under the physical interference model.

Simulation Setup.
A sink is positioned in the center of deployment field, and each sensor node sends its packet to the sink over single or multiple hops. The working schedules of all the nodes are predefined by picking their one active slot randomly in a cycle period, which will be fixed through the data collection process. We investigate the network performance with different duty cycles for the proposed RCTS algorithm and the centralized MDCD algorithm which assigns extra time slots to eliminate interfering links (noted as MDCD-IL), compared with one of well-known SPR algorithms, that is, Dijkstra Algorithm [27]. Since the Dijkstra Algorithm aims to determine the path with minimum delay for end-to-end communications, collision will occur when it is extended to many-to-one data collection. Thus, retransmission is adopted as collision recovery strategy for the Dijkstra Algorithm in compared simulations. In order to evaluate the general case, we set the pathloss exponent to be 2 in the cases under the outdoor environment. The transmission power is fixed at the highest level, which is 31 in the TelosB sensor nodes with MSP430 MCU and CC2420 RFIC integrated on board. Since IEEE 802.15.4 ZigBee radios, used on Telosb and TmoteSky motes, are capable of operating on 16 different frequencies with nonoverlapping channels with a fixed bandwidth of 2 MHz, the maximum number of available channels is set to be 16. Based on the physical model, we find that the path-loss exponent can be set as 1.8 and 2.3 to simulate the real outdoor environment without or with obstacles, respectively, according to the real RSSI measurement experiment [28], as demonstrated in Figure 8. Outdoor Scenario Settings. Sensor nodes are randomly deployed in a fixed 100 m × 100 m square field by the Waxman random network topology generator [29]. The reason to choose the random deployment is that sensors are usually scattered randomly in the wild. The number of sensor nodes are varied between 20 and 60 to simulate different levels of density.
Indoor Scenario Settings. We use the real data trace from the 54 sensors deployed in Intel Berkeley Research lab, shown in Figure 9, monitoring the building environment information, such as temperature, light, humidity, and voltage values [30]. This data trace contains the aggregate connectivity data averaged over all time for 37 days, whose distribution and Cumulative Distribution Function (CDF) are shown in Figure 10. We establish the links with link quality above 0.5 in our indoor scenario simulations.
Performance Metrics. We exploit the following metrics to evaluate the network performance. (1) Collection delay is measured as the time elapsed from packets being generated until all ( − 1) packets reach the sink, reported by the total number of time slots to complete data collection. (2) Energy consumption is measured by the total number of transmissions for all packets delivered to the sink. Since the power level is fixed, the power consumption of each transmission is equal. (3) Delivery ratio is measured as the ratio of successful transmission times compared to the total number of transmissions for all packets delivered to the sink.

Simulation Results
(1) Results in Outdoor Scenario. Each simulation is repeated for 100 times, and we report the average values as statistical results. In order to evaluate the impact of duty cycles, the duty cycles are set to be 0.05, 0.1, 0.15, 0.2, and 0.25, respectively. 40 nodes are randomly deployed in 100 × 100 square field. In order to show the improvement of time efficiency for eliminating all interfering links, RCTS is compared with MDCD-IL, which exploits extra time slots to eliminate interfering links. We present the results comparing our RCTS algorithm with MDCD-IL and the Dijkstra Algorithm as well as the upper and lower bounds on the data collection delay. Figures 11 and 12 show the data collection delay and the number of transmissions under different duty cycles, respectively. From Figure 11, we can see that RCTS algorithm has the smallest delay than the compared Dijkstra Algorithm under different duty cycles while retaining low energy cost, since the number of transmissions remains small and stable over different duty cycles shown in Figure 12. As Figure 11 depicts, the delay of RCTS algorithm outperforms that of Dijkstra Algorithm by 32% to 11% when duty cycle increases from 0.05 to 0.25. Since the available time for nodes to receive packets is notably reduced, lower duty cycle has high probability to incur collisions. The channel and time scheduling of RCTS algorithm can largely reduce collisions, and the advantage of RCTS algorithm is more obvious in lower duty cycle. Compared with RCTS and MDCD-IL, we observe that the use of multiple channels can efficiently improve the data collection delay when the interfering links are avoided by multiple channels. It is also observed that the collection delay of RCTS algorithm almost overlaps its lower bound. Thus, it validates that RCTS algorithm can obtain the optimal solution.
From Figure 12, we find that the duty cycle has slight effect on the number of transmission in RCTS algorithm, which outperforms the Dijkstra Algorithm up to 67%. Figure 13 shows that the number of transmissions is positively correlated with the total number of nodes in the network, while little is related with the duty cycle. Thus, it is a good way to save energy by reducing the duty cycle, which will not cause increased transmission times. The reason is that RCTS algorithm is scheduled well without collision to retransmission. We can also infer that the delay in low-dutycycle WSNs is mainly caused by sleep latency instead of retransmission.
We use and to represent the upbound of collection delay and the time distance from the upbound to the optimal latency, respectively. We define the ratio of and to evaluate the effectiveness of RCTS algorithm. From Figure 14, we also find that the effectiveness of RCTS algorithm has little impact on the number of nodes and duty cycle and stays between the range of 30% and 50%.
From Figure 15, we can observe that the delivery ratio raises rapidly, when the duty cycle decreases to 0.2. The concurrent transmissions on the same channel are alleviated, since the probability of nodes active simultaneously is reduced. When the duty cycle is extremely low (less than 0.1), we observe that the single channel is enough to deliver packets with few interfering links as the deployment gets sparser (less than 0.4). This happens because at low densities the interference is less and less concurrent transmissions take place. We also find that the delivery ratio can be over 0.9 when the duty cycle and density are both low, which can satisfy the requirements in most wireless networks.  Figure 16 shows the number of assigned channels by Algorithm 3 with different node densities when the duty cycle is set to be 0.25. With the increase of node density, the channel resources are needed more to eliminate the interfering links. As we can see from Figure 16, when the number of nodes augments to 60, we need at most 13 channels to avoid interfering links. We find that channel scheduling can suffice to eliminate most interference for moderate size of about 60 nodes in the outdoor environment, even when the duty cycle is not extremely low (0.25).
(2) Results in the Indoors Scenario. We also use the exponential path-loss model for signal propagation with the path-loss   exponent varying between 3 and 4, which is typical for the indoor environment. Based on the physical interference model, the transmission power is also set to be the highest level and SINR threshold is set as = −3 dB. The above parameters setting is the same as [24].
From Figure 7, we see that the delivery ratio is relatively high when all nodes communicate on the same channel in the indoor environment. Since the signal strength rapidly degrades with the increase of communication distance, the interference among neighbors will decrease accordingly.

Input:
The original network topology. Output: The optimal data collection paths and channel assignment with the minimum delay.
(1) Call the DFDC algorithm to obtain the optimal link schedule on the same channel. (2) The receivers check the SINR condition among the concurrently transmitting senders (3) if There exist receivers with SINR ≤ then (4) The receiver with lower ( ) assigns one available channel. Thus, the appearance of interfering links will reduce in large amount. Although the delivery ratio of CFDC algorithm is in good performance, in order to ensure the stable communication, we can also use the multichannel scheduling to improve the performance of CFDC algorithm further. Figure 17 shows the mean number of assigned maximum channels in different path-loss exponent ( ) by Algorithm 3. The performance improvement of RCTS is more obvious when the duty cycle is highly compared with CFDC algorithm.

Conclusion
In this paper, we studied the problem: How fast can raw data be collected from all source nodes to a sink in low-dutycycle WSNs with general topology? Both lower and upper tight bounds were given for this problem. We presented both centralized and Distributed Fast Data Collection algorithms to address this problem, both of which are able to find an optimal solution in polynomial time when no interfering links happen. When interfering links happen, multichannel scheduling is introduced to eliminate the interfering links.