Modeling of Multihop Wireless Sensor Networks with MAC, Queuing, and Cooperation

We present a Markovian decision process (MDP) framework for multihop wireless sensor networks (MHWSNs) to bound the network performance of both energy constrained (EC) networks and energy harvesting (EH) networks, both with and without relay cooperation. The model provides the fundamental performance limit that a MHWSN can theoretically achieve, under the general constraints from medium access control, routing, and energy harvesting. We observe that the analyses for EC and EH networks fall into two branches of MDP theory, which are finite-horizon process and infinite-horizon process, respectively. The performance metrics for EC and EH networks are different. For EC networks, an appropriate metric is the network lifetime; for EH networks, an appropriate metric is, for example, the network throughput. To efficiently solve the models with high dimension, for the EC networks, we propose a novel computational algorithm by taking advantage of the stochastic shortest path structure of the problem; for the EH networks, we propose a dual linear programming based algorithm by considering the sparsity of the transition matrix. Under the unified MDP framework, numerical results demonstrate the advantages of cooperation for the optimal performance, in both EC and EH networks.


Introduction
This paper considers modeling the optimal performance of multihop wireless sensor networks (MHWSNs), which can be energy constrained (EC) networks or energy harvesting (EH) networks, both with and without relay cooperation.The network performance of a MHWSN is a complex function of sensors' harvested energy, traffic volume, routing protocol, and medium access (MAC) technique.An analytical approach that is suitable for WSN traffic and that derives the optimum would serve as a valuable benchmark for heuristics MAC and routing protocols.However, such a multihop framework has not been presented.In this paper, we explore the optimum by presenting a unified Markov Decision Process (MDP) analysis that can analyze both EC and EH networks.We observe that the treatments for both networks fall into two branches of MDP theory, the finite-horizon process and the infinite-horizon process, respectively.
Existing analyses have mainly focused on a single node [1], a single link [2], or a single-hop network [3].Departing from the literature, this paper addresses the general multihop network with neither assumptions on node distribution nor assumptions on total traffic pattern, motivated by the following reasons.First, since the communication range of a single node is limited, single-hop deployment [3] cannot always meet the requirements of many applications, such as environmental and structural monitoring, security and defense, and wireless body area networks [4].Second, the traffic pattern of nodes in a MHWSN is unbalanced, invalidating the homogeneous or deterministic traffic assumptions, which are widely assumed for wireless ad hoc or mesh networks.In particular, some nodes near the Sink consume their energy faster because they must use their energy to relay other nodes' data.In EC networks, this unbalance creates the well-known "energy hole" problem [5].In EH networks, the impact on energy consumption is still nontrivial.Moreover, the extent to which the traffic is unbalanced is not known a priori and is usually determined by a routing protocol.Therefore, towards quantifying the network performance, careful consideration should be taken to capture the peculiarity of the traffic 2 International Journal of Distributed Sensor Networks pattern.Third, because the network's energy consumption rate is closely related to nodes' interaction in various aspects including channel access, routing, and cooperation, an optimal solution obtained by focusing only on a point-to-point link does not imply the global optimum for the networkwide objective.In fact, quantifying the latter problem is fairly challenging.
On another track, cooperative transmission (CT) is a mixture of communication protocol and physical (PHY) layer combining scheme that allows spatially separated nodes to collaborate to transmit a single message by forming a virtual multiple-input-single-output (VMISO) array.A receiver, gaining a signal-to-noise ratio (SNR) advantage of 10 dB to 20 dB through PHY combining [6], can decode the message at an extended range [7].CT range extension has been experimentally demonstrated in [8], and it shows significant impact on Layer Two and Layer Three of a WSN by eliminating the "energy hole" in EC network [9,10] and by providing better Quality of Service in EH networks [11].Time-division CT (TDCT) is one type of cooperation, in which the source node multicasts the packet, and the cooperators that decode the packet forward it in orthogonal time slot [12].Our previous work OSC-MAC [13] shows notable lifetime improvement over state-of-the-art MAC protocols by using TDCT to balance energy.Yet, it is noted that the performance gap between the CT MAC protocol [9,10] with fixed routes and the theoretically optimal value is unknown.Also note that the analysis in [11], among others, assumes perfect link scheduling, which contrasts with the practical situation wherein link activities are subject to MAC constraints and packet collisions occur.All the above discussions have further motivated our analysis to comprehensively consider energy harvesting, routing, MAC, and cooperation, from the network perspective.Our previous work [14] models the lifetime of battery based multihop sensor networks, which, however, does not apply to energy unconstrained networks (e.g., through energy harvesting).This paper extends the model in [14] to provide an analysis framework that apply to both energy constrained (EC) networks and energy harvesting (EH) networks.
The rest of the paper is organized as follows.Section 2 presents the related work.Section 3 describes the system model and assumptions.An MDP formulation is detailed in Section 4. The computational method and numerical evaluations for non-CT and CT networks are presented in Sections 5 and 6, respectively.Finally, Section 7 concludes the paper.

Related Work
In this section, we describe existing works related to modeling for wireless sensor networks in terms of the scope of modeling, the assumptions, and the technical tools.Then we highlight the challenges solved and contributions made in this work in comparison.
Existing analyses are mainly limited to a single node, a single link, or a single-hop network [15].For example, [1] focuses on energy management of a single energy harvesting (EH) node for achieving maximum throughput and minimizing delay, using an MDP [16] model.While it is assumed in [1] that the characteristic of traffic and energy is stationary (same as in this work), it also assumes infinite packet queue and energy queue, which is not practical.In [17], a method based on stochastic timed automata and statistical model checking is proposed to model a WSN protocol called timing-sync protocol, which however focuses only on node behavior.In [18], an MDP model is used to model the sensor processor (CPU) energy and Petri nets are used to model an energy constrained (EC) sensor node.In [2], an MDP model is presented for EH tags in a point-to-point link, assuming a time-slotted system and a stationary energy harvesting model.Transmission of EH nodes in a single link and in the context of a fading channel is tackled in [19].In [3], an MDP model is used to formulate the lifetime of a singlehop EC network, which considers sensor scheduling in a scenario where only a fraction of sensors collect information and communicate directly with a one-hop Sink.In [20], considering nodes that are equipped with a hybrid energy storage system, the authors provide an MDP model for a single-hop EH network.The major limitations of these works are their inability to analyze the performance of a network that extends beyond one-hop deployment and the lack of analysis for cooperative transmission.
For multihop networks, the past analyses have exploited various approaches.The optimal lifetime of a multihop WSN is derived in [21] using linear programming (LP) formulation based on traffic balance conditions.However, the analyses in [21] are limited to only the routing layer.In [22], a model based on mixed integer linear programming is proposed to identify energy-efficient route from the source sensor to the Sink; yet, it considers only energy minimization and does not produce network-wide objective optimization.In [23], the authors provide a Lyapunov analysis for multihop EH WSNs, which considers the dynamics of energy and queue, to maximize the utility.However, [23] is based on queue stability, which is not suitable for low-traffic WSNs.Reference [24] studies the upper bound on the lifetime of data gathering sensor networks, while [25] derives the upper bound on the lifetime of large-scale networks using the theory of coverage process.Both [24,25] are based on the assumption that the data sources are deployed with a particular probability density function or process; thus they are unable to capture the inhomogeneous traffic characteristic in a multihop network.We note that all the aforementioned works fail to consider MAC constraints, as they assume perfect link scheduling and thus ignore packet losses and retransmissions, which contrasts the practical situation wherein link activities are subject to MAC constraints and packet collisions occur.Moreover, none of these works, except for [21], considers cooperative transmission.
For cooperative networks, the existing analyses are typically limited to single-hop networks.In [26], the authors consider the lifetime maximization in an amplify-and-forward cooperative network and model the energy dissipation of nodes as a Markov chain.In [27], the authors develop a general probability model to study the outage performance of the best-relay selection with adaptive decode-and-forward cooperative network.Relatively less works have focused on analysis of multihop cooperative networks.The LP model in [28] has been recently extended to multihop CT networks [29] by considering all single-input-single-output (SISO) and virtual multiple-input-single-output (VMISO) links.Again, none of these works has considered MAC layer link constraints, and therefore the bounds that they provide are optimistic.Moreover, because the interference range has significant effect on the performance of the MAC, and the interference range of CT could differ from that of a SISO transmission, a complete evaluation of multihop cooperative networks is hard to obtain without the inclusion of MAC constraints.
The challenges in modeling a multihop WSN reside mainly in the scope of modeling and the computational complexity.In this paper, we report a unified modeling framework with reasonable complexity that provides the fundamental performance limit that a MHWSN can theoretically achieve.Our model applies to both energy constrained networks and energy harvesting networks, and it captures the general constraints from MAC, routing, and energy dynamics.We further present two computational methods taking the advantage of the characteristics of EC networks and EH networks, respectively.

System Model and Assumptions
Each sensor node has two queues, the packet queue where data packets are stored and the energy queue where energy is stored.Both the packet queue and the energy queue have limited capacity.We assume that no dynamic transmit power control technique is used; that is, all the nodes use the same transmit power level, which means CT is used exclusively for range extension purpose instead of power reduction.

Time Slots in the System
. Unlike [30] where the time slot is varied and is defined as the interval between two random events on a node, that is, between instances of energy arrivals, we consider a mini-time-slotted system as shown in Figure 2, where the slot duration is constant and slots are normalized to integral units  ∈ {1, 2, 3, . ..}.By time , we refer to the duration within [,  + 1].The randomness of events occurs during the slot.The reason to use constant slot interval is twofold: first, this provides the time from the perspective of a network that comprises many nodes; second, this allows us to model the packet transmission duration, against the instantaneous information capacity, because packetizing is very common in wireless standards, including IEEE 802.15.4, in which rate changing is infeasible in the duration of onepacket transmission.
In addition, we remark that the time slots in this paper are different and more general than those in the link scheduling literature.In the latter case, the time is divided into slots equal to the length of packet transmission time, and all nodes are synchronized to slot boundaries.Our model does not have such a constraint, so the packet length can span multiple slots and nodes are not necessarily synchronized to slot boundaries.

Topology Model.
The multihop network topology is modeled as a directed graph  = (, ). = {1, . . ., } is the set of nodes except the Sink, whose ID is 0, and  is the set of links comprising both single-input-singleoutput (SISO) links and VMISO links; that is,  =  so ∪  vo .A SISO link  exists if its source and the destination are within the maximum transmission range.Given the transmission power   and the required receiving power  ,min , the maximum transmission range  max is defined as  max = (  / ,min ) 1/ , where  is the path loss exponent and  is the constant of proportionality [31].A VMISO link exists if the source and the destination are within maximum transmission of CT with the cooperators.A VMISO link can be formed between the cooperating nodes (the initiator and cooperators) and the VMISO receiver.In the case of VMISO link, we assume the destination is the Sink; for example, Nodes A and B form a VMISO link to the Sink in Figure 1.Therefore, a link is represented by the 3-tuple, ⟨(), (), C(())⟩ ∈ , where () and () are the source and the destination, respectively; C(()) is the cooperators of ().This link is a valid VMISO link if and only if the same as in [21], where  ,() is the distance between Node  and the VMISO receiver and   is the SNR gain as a function of the number of cooperating nodes; for example,  2 = 10 dB,  3 = 13.5 dB according to [32] for BPSK modulation at a bit error rate of 10 −3 .Note that, in the case of SISO link, C(()) = ⌀ and, in the case of VMISO link, () = 0.

Interference Model.
A node has three ranges associated with it: the transmission range (TX), the interference range (IF), and the carrier sensing range (CS), as shown in Figure 1.
Similar to [33], we assume that link (, , ⋅) interferes with link (, V, ⋅) if either the distance dist(, V) ≤ IF() or  = V.For instance, in Figure 1, Node  and any node in the gray area can start transmission at the same time because they are out of carrier sensing (CS) range of (i.e., hidden from) each other.Also, located in Node V's interference (IF) range, the hidden node's transmission interferes with Node V's reception.This phenomenon directly results from the CSMA mechanism of the MAC layer and the relationship between the three ranges (TX, IF, and CS) [34].

Medium Access Control Model.
We assume that carriersensing-medium-access (CSMA) is performed before a node attempts to transmit.Further, similar to [35], it is assumed that (i) a node cannot transmit and receive at the same time; (ii) a node can transmit if none of its neighbors (in CS range) is transmitting; (iii) link errors result only from collisions due to hidden terminals; (iv) nodes receive with perfect capture; that is, a packet is successfully decoded if the receiver and none of its neighbors are transmitting at the start of packet; and (v) the propagation time is zero.Our model can be  extended to incorporate fading and shadowing effects of the wireless channel in computing packet errors; however, in this paper we assume (iii) to simplify the presentation.

Traffic Model.
The transmission duration of a link  ∈  is assumed exponentially distributed with the expectation   , the same as in [35,36].Note that this assumption is inaccurate; however, it is necessary to make the MDP problem tractable [36].The memoryless property of the exponential distribution indicates that, given a packet is being transmitted at the beginning of a time slot, it completes the transmission within the slot of length Δ with probability   = 1 −  −Δ/  , regardless of the number of slots it has been transmitting.Making Δ small enough allows us to assume that at most one link can finish transmission during a time slot; that is, the probability that more than one link finishes transmission is (Δ).

Energy Harvesting Model.
A realistic power consumption model for a sensor node has been studied in [37], which decouples the power usage into baseband DSP circuit, the RF front-end circuit for transmitting and receiving, the power amplifier (PA), and low noise amplifier (LNA).A similar circuit-level analysis is also performed in [38] for energy modeling in cooperative multiple-input-multipleoutput (MIMO) sensor networks.In [39], six different statistical models have been used to fit empirical datasets of a solar-powered energy harvesting sensor node.Our recent work [40] provides an energy model for energy harvesting node considering harvesting and leakage in a supercapacitor.
While we are aware of more complex energy models, for simplicity, in this paper we assume a simpler model.However, we note that it should be straightforward to incorporate complex energy consumption and harvesting models into our framework.In this study, the energy harvested at Node  during any slot  is modeled as a binary RV, which is denoted by ℎ  () ∈ {0, 1}.This RV has PMF Pr[ℎ  () = 1] = () and Pr[ℎ  () = 0] = 1 − ().Representing the energy harvesting rate, () is assumed i.i.d.over time slots.Thus, similar to [1,41], the energy of Node  evolves as where  (,) com is the energy consumption of Node  if it participates in a finished link transmission  and  max is the battery capacity of a node.We note that other EH models can also be applied, for example, the correlated EH model [2] where the energy harvested in a slot is correlated to the previous slot.Further, we assume that the energy harvested in a particular slot is available at the end of the slot.

Markov Decision Process Formulation
In this section, we describe the Markov Decision Process formulation for the EC and EH networks.We model the network state space by a tuple consisting of the transmission set in the network considering the MAC constraints, the queuing level, and the energy level for each node.Next we specify the action space at each time slot, which affects the network state.Then we analyze the system transition dynamics by decoupling the system into the tuple components with condition and then combining them based on the chain rule.

Energy Constrained (EC) Networks
4.1.1.Network State Space.The network state is defined as s ≜ {L, q, e} to include the transmission set L, the queuing level q of each node, and the energy e of each node.Note that the state space is not the Cartesian product of each component's state space, because they are interactive (e.g., an active link cannot have an empty queue at its source node).
The transmission set L() includes the links that are active (in transmission) at time , which must not violate the carrier sensing constraints of the MAC.Note that it does not mean that all the receivers of links in the transmission set will receive successfully.Denote the state space of L as L.
This component includes the collisions in the MAC layer.We denote the links that are free of collision as Φ(L) ⊂ L. Φ(L) is deterministic given L, and Φ = Φ so ∪ Φ vo .
To find L, it is equivalent to find the matchings of a graph.Given a graph  = (, ), a matching () in  is a set of nonadjacent edges (i.e., no two edges share a common vertex), implying L ∈ {()}.Further, the MAC layer CSMA adds constraints on L, where for any two links  1 ,  2 ∈ L, the sources of the two links (( 1 ), ( 2 )) must be out of carrier sensing (CS) range of each other.Therefore, L is determined by graph  and the binary hearing matrix H = [ℎ  ] ,∈ of the network, where the element ℎ , = 1 if and only if (iff) Node  and Node  are within CS range of each other.Though enumerating matchings of a graph is NPcomplete, Bron-Kerbosch algorithm has been shown to be one of the fastest [42] and has been used in this paper.

Decision Epochs and Action Space.
A decision epoch corresponds to the beginning of a time slot.The set of decision epochs are denoted by {1, 2, . . ., }.When  is finite (infinite), the decision problem is referred to as a finite-horizon (infinite-horizon) problem.An action, (), is to admit a new link to the "remaining" transmission set L(), which comprises those links that did not complete transmission during slot  − 1.Note that L() ⊂ L( − 1) ∪ ( − 1).The action space at time  when the system state is s() is represented by where  suf indicates e  ≥  (,) com , ∀ ∈ .Therefore, a link  can join L() iff its source has a nonempty queue, L() ∪  is also a transmission set, and the participating nodes have enough energy.Note that it is assumed that at most one link can be admitted to L(), because Δ is small.Also, note that the null set ⌀ ∈ A(s()).As a result, an action can be either a "CT" link, a "non-CT" link, or null (no new transmission).

State Transition Dynamics.
We first characterize the state transition of each component.
(1) Transmission Set Dynamics.Let action () ∈ A(s()) denote the link admitted to the transmission set.Denote {  () | L(), ()} as the probability that only link  ∈ L() ∪ () finishes transmission during slot  and { ⌀ () | L(), ()} as the probability that no link finishes transmission during slot .The time index is dropped when there is no ambiguity.For abbreviation, we denote the transition kernel as Define D () ≜ {L ∪ } \ L  ; then we can get (2) Queue Length Dynamics.Considering that EC networks are most suitable for light traffic applications, such as monitoring, we assume that at most one node self-generates a packet during a slot.Let I  represent a vector with its th element being 1 and other elements being 0, 1 ≤  ≤ , and let I 0 be the zero vector.The probability that only Node  or no node ( = 0) self-generates a packet is given by where () is the probability of a self-generated packet arrival for Node  in a time slot and is assumed i.i.d.over time slots.
Then the queue evolution can be described by the following cases.
Case 1 (( 1 ): q  = q).This can happen for two reasons: ( 1 ) no link transmission affects the queue state and no new exogenous arrival affects the queues and ( 1 ) some link (, 0, ⋅) destined to the Sink successfully transmits and the source self-generates a new packet: Case 2 (( 2 ): q  = q + I  ,  ∈ ).This case includes two possibilities: ( 2 ) some SISO link (⋅, , ⌀) that is directed to Node  successfully transmits and the source self-generates a new packet and ( 2 ) no link transmission affects the queue state and there is a new exogenous packet arrival to Node : International Journal of Distributed Sensor Networks Case 3 (( 3 ): q  = q − I  ,  ∈ ).This happens because some link (, 0, ⋅) destined to the Sink successfully transmits and no new exogenous arrival affects the queues: Case 4 (( 4 ): q  = q−I  +I  , ,  ∈ ,  ̸ = ).This case includes two possibilities: ( 4 ) some SISO link (, , ⌀) successfully transmits and there was no change in queue due to new exogenous arrival and ( 4 ) link (, 0, ⋅) successfully transmits and Node  self-generates a new packet: Case 5 (( 5 ): q  = q − I  + 2I  , ,  ∈ ,  ̸ = ).This occurs because the SISO link (, , ⌀) successfully transmits and Node  self-generates a new packet: Case 6 (( 6 ): q  = q − I  + I  + I  , , ,  ∈ ,  ̸ =  ̸ = ).This occurs because the SISO link (, , ⌀) successfully transmits and a different Node  self-generates a new packet: Therefore, the transition kernel of q() is expressed as (23).
(3) Energy Evolution Dynamics.The process e() is dictated by transmission energy and receiving energy consumption incurred during a finished link transmission, regardless of success or failure.In the CT case, the cooperators consume additional energy in receiving the broadcast packet initiated by the source and in conducting CT.The transition kernel of e() is given in (25).
(4) System Dynamics.The transition matrix of the system states s ≜ {L, e, q} can be obtained by the following theorem.
4.1.4.Expected Total Rewards.During a time slot, the system obtains a reward (s, ) = 1 if a packet was delivered to the Sink, and (s, ) = 0 otherwise, where s is the current state.
Then, with the state space  and the termination states   , for s ∈  \   , we have And, for s ∈   , (s, ) = 0.The expected total rewards of the process starting from an initial state s, under the policy , are denoted by

MDP Formulation.
A transmission policy is a series of decision rules  = [(1), (2), . ..],where () : S() → A(S()).The maximum lifetime J * (s) is given by The optimal lifetime J * (s) is the unique solution of the Bellman's optimality equation [16]: A policy  * is optimal if it achieves the maximum expected lifetime for all starting states; that is,

Energy Harvesting Networks.
The link set dynamics in the EH network are the same as in the EC network.In this section, we give a more general expression for the queue system, relaxing the low-traffic assumption in the previous sections.EH networks may provide better support for high traffic load than EC networks [43].
In a uniform form, the evolutions of q and e are governed by the following balance equations: (20) where  ∈ {, },  () = q, and  () = e; and R () and M () will be defined below.

Queue Length Dynamics.
In (20), M () is a || × || diagonal matrix.The diagonal element M ()  = 1 if the transmission of link  is completed and successful (a finished transmission, while always consuming energy, is not necessarily successfully decoded by the receiver, because of collision due to hidden terminals; in case of failure, the packet remains in the queue of the transmitter), and M ()  = 0 otherwise.Note that it is assumed ∑ ∈ M ()  ≤ 1; that is, at most one link can finish transmission.Γ () is the transmission schedule vector, which satisfies Γ ()  = 1, ∀ ∈ L ∪ , and Γ ()  = 0 otherwise.R () is the  × || routing matrix, whose element in the th row and th column is if  () =  and Node  is not Sink, () is an  × 1 vector representing the number of selfgenerated packets and  ()  =   .Let I  represent a vector with its th element being 1 and other elements being 0, 1 ≤  ≤ , and let I 0 be the zero vector.Denote  () () as the nodes whose queue is not full at time  and define the difference International Journal of Distributed Sensor Networks Then considering exogenous packet arrivals, the queue status is updated as follows: where 1{⋅} is the indicator function.

Energy Evolutions.
Using analysis similar to that for the queue in Section 4.2.1, denote  () () as the nodes whose battery is not full at time  and define Then, considering the influence of energy harvesting, we get 4.2.3.System Dynamics.Similar to the EC network, the transition matrix of the system states s ≜ {L, e, q} for EH network can be obtained by the following theorem.

Performance Criterion and Reward
Function.Let V  (s) denote the expected total reward, given an initial state s and a policy  (a series of decisions), where  is the decision-making horizon length.The definition of the reward function, (s, ), depends on application requirements and can be subjective to network designers.We define two reward functions as follows: In (27), a unit of reward is obtained if a packet is successfully received by the Sink.Equation ( 28) is a weighted sum of the rewards from delivered packets and the penalty from a QoS degradation (e.g.,   are states where any node's energy drops to zero).
Since the state space and the action space are finite, with a finite reward function, ( 26) can be expressed as where it is assumed that  is geometrically distributed with parameter , 0 ≤  < 1.Thus, the expected value of  is 1/(1− ).The parameter  is the discount factor, which measures the present value of one unit of reward received one period in the future [16].

Optimality Equations.
The optimality equation for the expected total discounted reward criteria given an initial state, a.k.a.Bellman's equation, is given as follows:

Computational Methods
In this section, we present the offline computational algorithms for the EC and EH networks.The methods seek to find the global optimum and do not require any information exchange among the nodes.

Energy Constrained Networks.
Considering that the total energies are nonincreasing, we group the states according to the sum energy (in decreasing order) and solve the stages backwards.The transmission sets are computed using the Bron-Kerbosch algorithm [42], which is widely used and considered as one of the fastest.In each stage, the computation of the energy and queue state spaces is similar to the Bin-Ball problem (the Bin-Ball problem refers to enumerating the ways of allocating  1 balls into  2 bins.In our problem, the "bins" are the nodes and the "balls" are the total energies or queued packets).For instance, the number of ways to allocate  1 units of energies into  2 nodes with minimum energy requirement of  =  th + 1 (in S \ S  ) is given by For an integer  (stage), the set of states   is defined as For each   and s ∈   , we have Therefore, we formulate a mixed integer linear programming (MILP) problem (Subproblem-1): where (s, ) are obtained from previously computed stages: The computational method is summarized in Algorithm 1.

Energy Harvesting Networks.
The following theorem provides the basis for a linear programming (LP) approach to solve the MDP problem.The proof can be found in [16].Theorem 3. Suppose there exists k, for which k ≥ T(k); then k ≥ k *  .T is the nonlinear operator defined as T(k) ≡ sup all  {r  + P  k}.
From the observation in Theorem 3, the primal LP is constructed as follows.
Primal LP.Consider the following: for all  ∈ A(s) and all s ∈ .(s) are chosen to be positive scalars that satisfy ∑ s∈ (s) = 1.
However, it is more informative to solve this model using its dual LP [16], which has less rows in the constraints matrix.
Dual LP.Consider the following: and (s, ) ≥ 0 for  ∈ A(s) and s ∈ .
Note that the primal LP has ∑ s∈S |A(s)| rows and |S| columns, while the dual LP has |S| rows and ∑ s∈S |A(s)| columns.In [44], it is shown that the value function for each state in the primal LP is equal to the dual price (the dual price (a.k.a.shadow price) of a constraint is instantaneous International Journal of Distributed Sensor Networks change in the objective function by relaxing the constraint by one unit) corresponding to the constraint associated with the state in the dual LP.Thus, the value function associated with a given initial state of the primal LP is obtained by firstly solving the dual problem and then finding the dual price of the corresponding constraint.

Computation Complexity of Large-Scale MDP. It has been
shown that MDP is solvable in polynomial time by successive approximation methods such as value iteration, without the existence of a known strongly polynomial time algorithm [45].Notoriously, when dealing with large-scale MDP, the computation of iterations may appear prohibitive.In contrast, computation based on large-scale linear programming is fast solvable with the state-of-the-art LP solver, in polynomial time depending only on the size of  (the constraint matrix in  ≤ ) [46].Also, the practical fast -approximation algorithm is most useful for solving LP with exponentially many constraints to trade off solution accuracy for time [47].
In this paper, we do not target finding the optimal policy.However, we note that standard policy iteration algorithms and their various modifications can be applied offline to obtain the optimal stationary policies, at the cost of iterations over the whole state space.In addition, policies obtained from this network model would require a network centralizer to conduct the execution, which is less appealing than a distributed approximate optimal policy and which is out of the scope of this paper.

Numerical Evaluations
We present some numerical results on the optimal lifetime for the EC network and the expected total discounted reward for the EH network, under the 2-hop funnel topology (Nodes A, B, C, and D and the Sink), as in Figure 1.Besides the computational limitation inherent to MDP, the motivation for choosing a small network is as follows.It is observed from our previous work [10] that under duty-cycling, a large network is typically reduced into isolated small networks in the time domain, to reduce interference and collisions.This observation renders the possibility to analyze a small tree topology towards the whole network.

Energy Constrained Networks.
In this section, we evaluate the optimal lifetime of the EH network and impacts from certain parameters.Note that the funnel topology (with the existence of the bottleneck Node C) captures the essence of the energy hole problem.
Figure 3 depicts the optimal lifetime for non-CT and CT networks.We observe the lifetimes are linearly increasing with the battery capacity.We also observe that the performance of CT network is significantly higher than that of the non-CT network.For example, with battery capacity of 10 units, the lifetime improvement factor of CT network is 1.89 with 1 cooperator.We assume that either one or two cooperators are required to reach the Sink.This is motivated by the fact that theory [32] predicts that one cooperator can double the range for certain path loss exponents, but our testbed experiment shows that at least two cooperators are needed to double the range [8].Considering our VMISO link model in (1), we can show that two helpers (|C(())| = 2) are required for double range instead of one helper, if the path loss exponent increases by a factor log( 3 )/ log( 2 ).In practice, the number of cooperators required to conduct CT range extension is regulated by the channel status and different CT physical layer techniques; an efficient protocol should select just enough cooperators to avoid energy overuse.The lifetime with larger battery capacities can be obtained from linear regression.From the slope of the extrapolated lines (not shown), the lifetime growth rate with respect to battery capacity is 1.0 for non-CT, 1.6 for CT with 2 cooperators, and 2.1 for CT with 1 cooperator.Therefore, we conclude that CT gives less of an advantage in terms of lifetime, if more cooperators are required for the same factor of range extension.
Figure 4 shows the computed lifetime for different packet arrival rates (PARs).The PAR is normalized with the packet length, that is, the number of new exogenous packets per node during a packet duration.The PARs where the curves start to become flat represent the PAR thresholds, beyond which the numerical results are accurate.Figure 4 demonstrates that our model is accurate for a very large range of arrival rates (<0.4).For example, if a node generates a packet every 100 seconds and the packet transmission time is 100 ms, then the PAR is 0.001, well below 0.4.In other words, the queue model assumption is valid as long as the application generates less than 4 packets per node per second.

Energy Harvesting Networks.
In this section, we evaluate the expected total discounted reward and the impacts from  certain parameters.The reward function in ( 27) is used for plotting Figures 5, 6, and 7, while (28) is used for Figure 8.
Figure 5 shows the expected total discounted reward of non-CT and CT networks, versus different discount factors .The decision-making horizon is indicated by 1/(1 − ) time units.The system parameters are  = 0.1,  max = 4,  max = 1,   =   =  CT init = 1, and  CT co = 2, where   and   represent transmitting/receiving energy,  CT init represents the energy consumption of initiating CT, and  CT co is the energy consumption of a cooperator.We observe that the rewards increase as  increases due to an increase in the number of decision epochs, which allows prolonged operation of the network in the optimal states prescribed by the optimal policy.While CT network always outperforms non-CT network (55% to 61% improvement), the curve of the CT network also grows with a steeper rate.
Figure 6 depicts the expected total discounted reward of non-CT and CT networks versus energy harvesting rate , with  = 0.9999 or  = 0.99999.Other system parameters are the same as used in Figure 5.We observe that, for both non-CT and CT network, the rewards increase linearly when  is below 0.01.Then the curves grow with a slower rate, before the performances of non-CT and CT networks start to converge when the value of  reaches a threshold value  th .For  = 0.9999,  th = 0.04, and for  = 0.99999,  th = 0.07.transmission opportunities are less constrained by energy level, diminishes the benefits of CT.However, in practice, when harvesting rate is not large enough, the CT network still provides a significant gain over the non-CT network, as shown in the figure.
Figure 7 shows the expected total discounted reward of non-CT and CT networks versus exogenous packet arrival rate , with  max = 1 or  max = 2.The system parameters are  = 0.9999 and  = 0.02.It can be seen that when the offered traffic load () increases, while not saturating the networks, the rewards show a linear increase.When  is large enough, the rewards are bottlenecked by the queue capacity.We also observe that the reward of non-CT shows slight instability between  = 0.01 and 0.02, between which the rewards slightly decrease, for both  max = 1 and  max = 2.We did not explain this observation.However, the CT network does not show this behavior.Figure 8 gives an example that the calculated benchmark of performance is highly affected by the choice of reward function.The network designer can choose a reward function according to a particular QoS requirement.The -axis is the preference weighting factor  1 in the reward function in (28).Note that  2 = 1 −  1 represents the significance of an occurrence of QoS degradation.When a certain application deems a state with a node having zero energy as a severe situation, both non-CT and CT networks will be conservative in initiating a packet transmission, limiting the packets delivered in a given time horizon.On the other hand, the value functions increase as the significance relaxes.Again, CT network still outperforms non-CT network in both magnitude and rate in all cases.
In Figure 9, we evaluate the effect of the CT overhead on the expected total discounted reward of the CT network, with different preference weighting factors  1 = 0.1, 0.5, 0.9.The expected total discounted rewards of the non-CT network, which are constant, are also plotted for reference.The CT overhead is modeled as an extended packet transmission time, which is a result of the extra preamble symbols in front of the packet payload to achieve diversity and synchronization for the purpose of range extension [12,48].Specifically, the horizontal axis in Figure 9 is the CT overhead, quantified in terms of the fraction of the total packet length, from 0 to 10%.For  1 = 0.9, at the overhead rate of 0.1 (10%), the CT network outperforms the non-CT network by 38.2%.For the CT network, comparing with the overhead rate of 0, only 2.3% degradation is observed at the overhead rate of 0.1.We note that, in a more practical setting with control packets accounted for, we expect to see a bigger degradation value.However, fortunately it is shown that, even with control packets accounted for, CT still has a significant advantage over non-CT in both small networks [9] and large-scale multihop networks [13].

Conclusions
Energy harvesting (EH) and energy constrained (EC) multihop wireless sensor networks are modeled under a unified Markovian decision process framework.The model extends the literature by encompassing, from the network interaction point of view, energy harvesting, routing, MAC, and cooperative transmission.The performance of EC and EH networks is characterized by finite-horizon and infinitehorizon problems, respectively.The EC model captures the traffic imbalance due to the "energy hole," which is a known bottleneck problem limiting network lifetime.In the EH network, the expected total discounted reward model allows one to formulate different performance metrics of interest.The model of EC networks is solved by mixed integer linear programing, capitalizing on the stochastic shortest path nature of the problem, while the model of EH networks is solved by a dual linear programming approach.Numerical results evaluate the sensitivity of several network parameters on the optimal performance of non-CT and CT networks and show the nontrivial impacts of cooperative transmission.

Figure 2 :
Figure 2: A timeline illustration of the decision process model for the network.The decision epochs are at the beginning of each minislot.
Figure 1: An illustration of interference model and VMISO link.Node V is the receiver of Node .Node 's hidden nodes in the gray area will interfere with V's reception.
This threshold is related to both energy harvesting rate and battery capacity.High harvesting rate, with which packet International Journal of Distributed Sensor Networks