Optimal Control of Epidemic Routing in Delay Tolerant Networks with Selfish Behaviors

Most routing algorithms in delay tolerant networks (DTN) need nodes serving as relays for the source to carry and forward message. Due to the impact of selfishness, nodes have no incentive to stay in the network after getting message (e.g., free riders). To make them be cooperative at specific time, the source has to pay certain reward to them. In addition, the reward may be varying with time. On the other hand, the source can obtain certain benefit if the destination gets message timely. This paper considers the optimal incentive policy to get the best trade-off between the benefit and expenditure of the source for the first time. To cope with this problem, it first proposes a theoretical framework, which can be used to evaluate the trade-off under different incentive policies. Then, based on the framework, it explores the optimal control problem through Pontryagin's maximum principle and proves that the optimal policy conforms to the threshold form in certain cases. Simulations based on both synthetic and real motion traces show the accuracy of the framework. Through extensive numerical results, this paper demonstrates that the optimal policy obtained above is the best.


2
International Journal of Distributed Sensor Networks nodes (consumers) get the message (advertisement) timely.
In addition, such benefit may be varying with time. For example, the sooner the nodes get message, the more the benefit will be. Therefore, the source has the incentive to push the message to other nodes timely. To achieve this goal, it has to pay certain reward to the relay nodes to make them be cooperative. In addition, such reward may be varying with time too. For example, the longer the time nodes stay in the network, the more the energy may be used, so nodes may ask for more rewards. In fact, nodes (e.g., phone, PDA) are often devices that can be manipulated by humans [9,10], and the buffer space or the forwarding ability of nodes can be seen as goods. Therefore, the event that the source requests help from other nodes can be seen as the event that the source buys certain goods from humans. The things that are used to buy goods by the source may be virtual currency [11] or discount of service [12] and so forth. Therefore, the message propagation process can be seen as the commodities trading process, and humans want to maximize its reward in this process. Therefore, these humans may adjust the price of their goods according to the market state. For example, if the remaining lifetime of message is shorter, they may think that the source is eager to transmit the message as soon as possible, so they think that their goods (e.g., the forwarding service) are important for the source. In this case, they may increase the price. On the other hand, if the remaining lifetime of the message is long, they may think that the source may be not eager to transmit the message quickly and is not willing to pay too many fees. In this case, they may help the source with only a little reward. Therefore, the price of the goods (e.g., the forwarding service) may be varying with time. In this environment, whether to make nodes be cooperative at specific time is an important problem for the source. For example, suppose that node is willing to help the source by charging m nuggets (price of the goods) at time 1 , but it only requires n nuggets at time 2. If > and 1 < 2, the source can pay less nuggets when it requires help from node j at time 2, but this may decrease the message propagation speed, so it may be not good for the source. On the other hand, if the source uses fewer nuggets to make node be cooperative, the remaining nuggets are more, so the source may have enough nuggets to make more nodes be cooperative and this is good for the source. Therefore, the optimal policy of the source is related to time. In this case, how to maximize the total income of the source is not a simple problem and this will be our main contribution.
The main contribution of this paper can be summarized as follows.
(i) We consider the optimal incentive policy to get the best trade-off when the reward is varying with time for the first time. (ii) We propose a unifying framework through a continuous time Markov process, which can be used to evaluate the trade-off between the benefit and expenditure of the source under different incentive policies. (iii) Then based on the framework, we formulate an optimization problem. Through Pontryagin's maximum principle, we explore the optimal control problem and prove that the optimal policy conforms to the threshold form in some cases. By comparing the simulation results with the theoretical results, we show that our theoretical framework is very accurate. In addition, we compare the performance of the optimal policy with other policies through extensive numerical results and find that the optimal policy obtained by our model is the best.

Related Works
In fact, our work is similar to the optimal controlling problem for epidemic routing algorithm (ER), such as the works in [13,14]. These works mainly study how to maximize the average delivery ratio when the energy is limited, and the energy consumption for forwarding the message once is not related to time. Therefore, these methods cannot be used to solve the optimization problem in our paper, in which the reward that the relay nodes require is related to time. On the other hand, the work in [15] studies similar problem as that in this paper, but it tries to get the optimal forwarding policy when the total fees are limited. However, we study the tradeoff between the income and expenditure, so they are different. There are many other selfish behaviors, such as individual selfishness and social selfishness [16,17]. At present, some works study the impact of these selfish behaviors. For example, Li et al. study the impact of social selfishness on the epidemic routing protocol [17]. Then, they explore the impact of both individual selfishness and social selfishness on multicasting in DTN [18]. However, these selfish behaviors depend on the social relations between nodes. For example, the social selfishness denotes the selfish behavior between friends. Therefore, the distribution of friends may have certain impact. Existing studies have shown that the number of friends of nodes may be different. In particular, the distribution of friends may conform to a power law distribution [19]. Therefore, if we consider those selfish behaviors, we have to classify the nodes according to their number of friends, and this will be a controlling problem with multiple parameters. Such interesting problem is an extension of our work, and we will study it deeply in the future.

Network Model
Suppose that there are one source , relay nodes, and a destination node . At time 0, only the source has message, and it wants to make the destination obtain the message before the maximal lifetime . To achieve this goal, the source needs the help from others. However, it has to pay certain reward every time it makes a relay node be cooperative, so it may not do this all the time. Note that only the relay nodes that have message can forward it to others, so the source only makes these nodes be cooperative. In other words, the source is not willing to pay reward to nodes that do not have message. In this paper, we assume that the source makes a relay node (e.g., node ) that has message cooperative with probability ( ) at time , and then can get the required reward denoted by ( ). As shown in previous section, the function ( ) may be varying with time. On the other hand, the source can obtain certain benefit denoted by ( ) if the destination gets message at time . In addition, we assume that all of the relay nodes are willing to receive message. In fact, if they get message, they may get certain reward from the source, so they have incentive to receive message and this assumption is rational.
Nodes in the network can communicate with each other only when they come into the transmission range of each other, which means a communication contact, so the mobility rule of nodes is critical. In this paper, we assume that the occurrence of contacts between two nodes follows a Poisson distribution. This assumption has been used in wireless communications for many years. At present, some works show that this assumption is only an approximation to the message propagation process. For example, the work in [20] reveals that nodes encounter with each other according to the power law distribution. However, it also finds that if you consider long traces, the tail of the distribution is exponential. Furthermore, a more recent work in [21] studies the vehicles' dataset in large-scale urban environment and finds that the intermeeting time can be modeled by a three-segmented distribution. Though the first and second parts of the contact intervals do not obey the exponential distribution, it also recognizes that the tail obeys the exponential distribution. In addition, the work in [22] shows that individual intermeeting time can be shaped to be exponential by choosing an approximate domain size with respect to the given time scale. Moreover, there are also some works, which describe the intermeeting time of human or vehicles by exponential distribution and validate their model experimentally on real motion traces [23,24]. For this reason, the exponential model is still widely used in many existing works, such as [25][26][27]. In this paper, we also use such model and assume that the intermeeting time between two nodes follows an exponential distribution with parameter . Simulations based on both synthetic and real motion traces show that our theoretical framework based on such assumption is very accurate.
Besides the intermeeting time, many other factors can have certain impact on the routing performance, such as the contact duration, bandwidth, and message size. If the bandwidth is big enough, the message may be transmitted successfully in one contact. However, if the bandwidth is too small, it may be hard to transmit the message to one contact, even though the contact duration is long. At present, some works find that the distribution of the contact duration may conform to the Pareto distribution [28,29]. However, the Pareto distribution is hard to be used to analyze the routing performance theoretically. Therefore, most of the previous works which explore the routing performance based on the theoretical method ignore the impact of the contact duration and assume that a contact is long enough to transmit the message, such as [13][14][15][16][17]. In this paper, we use the same assumption. Note that the assumption is rational when the message is small or the bandwidth is very big.
The commonly used variables of this paper can be seen in Table 1.

Number of relay nodes
The source node The destination Exponential parameter of the intermeeting time (the biggest value) The maximal lifetime of the message

( )
The probability that the source makes a relay node cooperative at time t ( ) The required reward at time t Symbol { ( )} denotes the set of relay nodes that do not have message at time , so the cardinality of this set is − ( ).
( , + Δ ) denotes the event whether the relay node gets messagein time interval [ , + Δ ]. If ( , + Δ ) = 1, we can say that node successfully obtains message, but if ( , + Δ ) = 0, this event does not happen. Note that a relay node can get message only from the source or other cooperative relay nodes. In addition, two nodes encounter with each other according to an exponential distribution with parameter . Therefore, node encounters with a specific node (e.g., ) in time interval [ , + Δ ] with probability 1 − − Δ . If node is the source, node can get message immediately. However, if node is a relay node, can get message from only when is cooperative. In addition, a relay node is cooperative at time with probability ( ), so the total probability that node gets message in interval [ , + Δ ] is Combining with (1) and (2), we can get ( ( + Δ )) = ( ( )) + ( − ( ( ))) ( ( , + Δ )) .
Further, we can obtain One main metric of routing algorithm in DTN is the delivery ratio, which denotes the probability that the destination obtains message within given time. Let ( ) denote the delivery ratio when the given time is . Before getting its value, we first give another symbol ( ) = 1 − ( ), which denotes the probability that does not obtain message before time . Moreover, let ( , + Δ ) denote the probability that does not get message in time interval [ , + Δ ]. Therefore, we have Similar to the relay nodes, may get message from the source or the cooperative relay nodes. Therefore, we have Further, we can obtain Let ( ) denote the total income of the source till time , which equals the result that the benefit takes away the expenditure. Therefore, we have Time interval Δ is very small, so we can assume that the behavior of the source remains unchanged. That is, the source makes a relay node that has message be cooperative with the same probability denoted by ( + Δ ) in the time interval. In addition, the number of relay nodes that have message can be denoted by ( + Δ ). Because the source has to pay certain reward if it makes a relay node be cooperative and the reward is ( ) at time , the total reward the source has to pay is ( + Δ ) ( + Δ ) ( )Δ . In addition, if the destination gets message, the source can obtain certain benefit. Symbol ( , + Δ ) denotes whether gets message in interval [ , + Δ ], so we can obtain (9).
Because nodes that have message do not receive the same message any more, if the event ( , + Δ ) happens, we can see that the destination does not have message before. In other words, we have Combining with (9) and (10), we can obtain (̇( )) = ( ) (̇( )) − ( ) ( ) ( ( )) .
Based on (11), we have is the maximal lifetime of the message, and our object is to maximize the value of ( ( )), which is a function about ( ). That is, our object is to solve the following question:

Optimal Control.
Obviously, the above question is an optimal control problem, and ( ) is the control variable. We use Pontryagin's maximum principle in ([30, P. 109, Theorem 3.14]) to solve the above problem. According to the principle, we should first get the Hamiltonian function. Let (( , ), ) be an optimal solution. In particular, at time , denotes the value of ( ( )) and denotes the value of ( ( )). Similarly, denotes the value of ( ). According to [30], the Hamiltonian function can be got by the derivative of the objective function and the derivation of the corresponding state functions, so we can get the Hamiltonian : Note that, at time , and are simple expressions of ( ) and ( ), respectively. Based on (14), we have The transversality conditions are shown as follows [30]: Then according to Pontryagin's maximum principle in ([30, P. 109, Theorem 3.14]), there exist continuous or piecewise continuously differentiable state and costate functions, which satisfy ∈ arg max 0≤ * ≤1 ( , , ( , ) , * ) .
This equation between the optimal control parameter and the Hamiltonian allows us to express as a function of the state ( , ) and costate ( , ), resulting in a system of differential equations involving only the state and costate functions and not the control function. In fact, this equation means that maximizing the value of ( ( )) equals maximizing the corresponding Hamiltonian . In particular, at given time , the state ( , ) and costate ( , ) can be seen as constants, and ( ) can maximize at this time. Therefore, according to (15), we can obtain the optimal policy as follows: Below, we will prove that when the function of ( ) and ( ) satisfies certain conditions, the optimal policy has a simple structure. The conditions are as follows: ( ) is increasing with time , but ( ) is no-decreasing function; ( ) and ( ) are continuous and differentiable; they are nonnegative. In fact, the maximal lifetime ( ) of the message is fixed, so if the value of is bigger, the remaining lifetime ( − ) is shorter. In this case, the relay nodes may think that the source may be eager to transmit message to quickly, so they may ask for more rewards. That is to say, if the value of is bigger, the value of ( ) may be bigger. Therefore, the condition that ( ) is increasing is rational in some environments. On the other hand, it is better if the destination gets message earlier, so the assumption that ( ) is no-decreasing function is rational in certain applications too.
If above conditions can be satisfied, the optimal policy conforms to the threshold form and has at most one jump. In particular, we have the following theorem. Proof. First, note that the functions ( ) and ( ) are nonnegative. In addition, we simply use ( ), ( ), and ( ) to denote ( ( )), ( ( )), and ( ( )) in the proving process, respectively. When = 0, none of the relay nodes has message, so the value of cannot have any impact. Therefore, we only consider the case that > 0. Based on (15), we define Then, we can geṫ Combining with (16), we havė In addition, from (16) That is, if ( ) = 0, the function will decrease at time . Then we assume that ( ) < 0. Based on (19), we have ( ) = 0. Combining with (22) and (23), we also can obtain (25). Further, we can get (27) and know that ( ) will decrease at time . In summary, if ( ) ≤ 0, ( ) will decrease at time . Therefore, if ( ) ≤ 0, we have ( ) < 0, > . Further, according to (19), the optimal policy satisfies ( ) = 1, < ℎ, and ( ) = 0, > ℎ, 0 ≤ ℎ ≤ . That is, once ( ) ̸ = 1, it will be 0 later and then remain unchanged all the time, so the optimal policy conforms to the threshold form and has at most one jump. This proves that Theorem 1 is correct.

Model Validation.
In this section, we will check the accuracy of our framework by comparing the theoretical results obtained by our model with the simulation results. We run several simulations using the opportunistic network environment (ONE) [31] based on three different scenarios. In the first one, we use the famous random waypoint (RWP) mobility model [32], which is commonly used in many mobile wireless networks. There are totally 500 nodes, and all of these nodes move according to the RWP model within a 10000 m × 10000 m terrain with a scale speed chosen from a uniform distribution from 4 m/s to 10 m/s. The communication range is 5 m. Moreover, the source and destination nodes are randomly selected among these nodes. In the second scenario, we use a real motion trace from about 2100 operational taxis for about one month in Shanghai city collected by GPS [33]. The location information of the taxis is recorded at every 40 seconds with the area of 102 km 2 . We randomly pick 500 nodes from this trace. In addition, the source and destination nodes are randomly selected among these nodes too. The third scenario is based on the dataset collected in the Infocom 2005 conference [34]. In particular, this dataset includes 41 attendees, who connect with each other by Bluetooth. Among those attendees, we randomly select two nodes as the source and destination, respectively.
The functions of ( ) and ( ) may be any form. For simplicity, we define ( ) = (1 − − /10000 )/1000 and ( ) = 1000 − /10000 . In fact, the value of ( ) may be any value between 0 and 1 at time too. Because our main goal is to check the accuracy of our theoretical framework, we only consider two special cases: case 1: ( ) = 1, ≥ 0; case 2: ( ) = 0, ≥ 0. The first case means that the sources make nodes be cooperative all the time and message is propagated according to epidemic routing (ER) algorithm, but in the second case, the source does not ask for help from others at all, so message is propagated according to the direct transmission algorithm (DT). At the starting of each simulation, one message is generated with maximal lifetime , and each simulation is repeated 20 times. In addition, let the maximal message lifetime increase from 0 to 50000 s.
Based on these settings, we can get Figures 1, 2, and 3, respectively.
From the results, we can see that the average deviation between the theoretical and simulation results is very small. For example, the average deviation is about 4.22% for the RWP mobility model and 5.01% for the Shanghai city motion trace. For the Infocom 2005 dataset, the average deviation is about 7.12%. Though the deviation is bigger than that in RWP and Shanghai city motion traces, it also can be seen very accurately. This demonstrates the accuracy of our theoretical framework. For this reason, we can use the numerical results obtained by our theoretical framework to evaluate the performance of different policies. In addition, the results above also show that the performance is different when the source adopts different policies. In particular, the results in Figures 1 and 2 show that it is not good for the source to request help all the time. For example, when the value of is bigger than 4000 s in Figure 2, the total income of the source may be negative if it requests help all the time. This shows that the policy of the source can have important impact on its total income, and this means that our optimal control policy is necessary. Later, we will show that the optimal policy obtained by (19) is the best through extensive numerical results.

Performance Analysis with Numerical Results.
In this section, we use the best fitting for the Shanghai city motion trace in the above simulation to describe the exponential distribution of the intermeeting time between nodes. First, we evaluate the performance of the optimal policy obtained by (19). For comparison, we consider three other cases: case 1: ( ) = 1, ≥ 0; case 2: ( ) = 0, ≥ 0; case 3: random. The random policy means that the value of ( ) is randomly selected from the interval [0, 1] at time . Other settings are the same as those in simulation, and then we can obtain Figure 4.
The result in Figure 4 shows that the optimal policy is the best one. Under the optimal policy, the source can always get the maximal total reward. This means that our optimal control policy is correct. Now, we further compare the performance of different policies when the number of relay nodes is different. In this case, we assume that the maximal message lifetime equals 10000 s, and let the number of relay nodes increase from 50 to 1000. Other settings remain unchanged. Numerical result is shown in Figure 5, and it demonstrates that the optimal policy obtained by (19) is the best too.
In addition, total reward under the optimal policy is increasing with the number of nodes. In fact, when there are more nodes, the source can request help from more nodes at early time. Because the reward that the relay nodes request is increasing with time (e.g., ( ) = (1 − − /10000 )/1000 is increasing function), this behavior will decrease the cost of the source and the source will stop requesting help at early time. As shown in Theorem 1, the source will stop making relay nodes be cooperativeat certain time (e.g., ℎ). In particular, the optimal policy satisfies ( ) = 1, < ℎ, and ( ) = 0, > ℎ, 0 ≤ ℎ ≤ . When the number of nodes is bigger, the value of ℎ is smaller, so the source stops requesting help earlier and it will pay less reward. The result in Figure 6 shows that ℎ is really decreasing with the number of nodes. When the number of relay nodes is smaller, the source has to ask for help for a longer time. For example, when there are 50 relay nodes, the source nearly requires help all the time.
On the other hand, the result in Figure 6 also shows that the optimal policy really conforms to the threshold form. We can see the result in Figure 7 more clearly, when there are 500 and 1000 relay nodes, respectively. That is, the source asks for   help from others with probability 1 before the threshold ℎ, and then it stops doing this.
In the above simulation and numerical results, we define ( ) = (1 − − /10000 )/1000, which is an increasing function. The optimal policy conforms to the threshold form in this case. However, ( ) may conform to any form. In the rest of this section, we want to know whether the threshold policy is still better when ( ) has different forms. In particular, we define ( ) = 50, ≤ 5000 s; ( ) = 0 > 5000 s. It is easy to see that ( ) is not an increasing function. Other settings are the same as those in simulation. Based on these settings, we can obtain Figure 8.
Note that the optimal policy is obtained by (19), but the threshold policy conforms to Theorem 1. In fact, there is a threshold policy, which is corresponding to a specific value of ℎ. Therefore, there are many threshold policies, which can be denoted by threshold (ℎ). The threshold policy in Figure 8 can maximize the total income of the source under all of the threshold policies.
The result in Figure 8 shows that the threshold policy is worse than the optimal policy obtained by (19). This means that the optimal policy does not conform to the threshold form in this case. Therefore, the form of the function ( ) can have certain impact on the optimal policy.

Conclusions
To increase the efficiency, most routing algorithms in DTN need nodes to work in a cooperative way. In particular, nodes should stay in the network to forward the message further after getting message. However, due to the impact of selfishness, nodes have no incentive to stay in the network after getting message. To make these nodes be cooperative, the source has to pay certain reward (e.g., ( )) to them, and such reward may be varying with time. On the other hand, if the destination gets message timely, the source can get certain reward (e.g., ( )) too. For example, the sooner the destination obtains message, the more reward the source may get. In this paper, we propose a unifying framework to evaluate the total income that the source gets under different policies. Then based on the framework, we study the optimal control problem through Pontryagin's maximum principle. In addition, we prove that the optimal policy conforms to the threshold form when ( ) and ( ) satisfy certain conditions. Simulations based on both synthetic and real motion traces show the accuracy of our theoretical framework. Numerical results show that the optimal policy obtained by (19) is the best.
International Journal of Distributed Sensor Networks 9 Note that once we know the functions ( ) and ( ), we can get the theoretical model that can evaluate the routing performance under different policies. Furthermore, we can get the optimal policy from (19). In this case, nodes just need to conform to the optimal policy. However, ( ) and ( ) are system specified, so we may not know their form as the premise. In this case, we need certain learning process to get the functions of ( ) and ( ) rapidly. In other words, in certain applications, we have to explore the learning process, and this will be our future work.