Robust fully distributed file caching for delay-tolerant networks: A reward-based incentive mechanism

This article exhibits a reward-based incentive mechanism for file caching in delay-tolerant networks. In delay-tolerant networks, nodes use relay’s store-carry-forward paradigm to reach the final destination. Thereby, relay nodes may store data in their buffer and carry it till an appropriate contact opportunity with destination arises. However, the relays are not always willing to assist data forwarding due to a limited energy or a low storage capacity. Our proposal suggests a reward mechanism to uphold and to sustain cooperation among relay nodes. We model this distributed network interaction as a non-cooperative game. Namely, the source node offers to the relay nodes a positive reward if they accept to cache and to forward a given file successfully to a target destination, whereas the relay nodes may either accept or reject the source deal, depending on the reward attractiveness and on their battery status (their actual energy level). Next, full characterizations of both pure and mixed Nash equilibria are provided. Then, we propose three fully distributed algorithms to ensure convergence to the Nash equilibria (for both pure equilibrium and mixed equilibrium). Finally, we validate our proposal through extensive numerical examples and many learning simulations and draw some conclusions and insightful remarks.


Introduction
Communication networks, whether they are wired or wireless, require that the end-to-end link always exists between a sender and a recipient. However, some emerging networks such as delay-tolerant networks (DTNs) 1 introduce new concepts like intermittent connectivity and large delay-tolerant services. Yet, DTNs are complex and distributed systems where a permanent link between pair nodes source/destination may only exist for brief and unpredictable periods of time due to the lack of network infrastructures. Nevertheless, the large delay is tolerated in such a network and the store-carryforward paradigm of relays is designed to make a full end-to-end transmission possible. The DTN relays connect ubiquitously over a direct link instead of passing through a base station in order to provide an enhanced coverage, high data rate, low latency, and low cost.
Consequently, DTN paradigms may improve the resource utilization and may reduce the load of base stations. Furthermore, higher energy efficiency and enhanced network availability/robustness can be achieved. Such a network provides modular solutions to assist data transmission within intermittent forwarding where steady connectivity is hard or even impossible to establish. From a practical perspective, the DTN concept has been extensively used in several fields. For instance, it has been deployed to strengthen military communications, 2 interplanetary networks, 3 underwater networks, 4 Vehicular Ad hoc NETworks (VANETs), 5 and many other applications.
In a DTN context, when the source-relay encounter takes place (i.e. they are into communication range of each other), the source transmits its content files to the relay. This latter stores/caches them, it waits arbitrarily until it experiences a link opportunity and forwards them to the target destination. Therefore, the relay nodes must be able to buffer files for long periods of time. Indeed, these relays can be wireless devices with limited battery lifetime and storage capability. Hence, they may not be always available to assist the file transmission, which makes connectivity intermittent and arbitrary. Thus, designing an efficient and effective scheme is crucial. In this article, we build a rewarding mechanism encouraging the relay nodes to participate in the forwarding game. We believe such a scheme makes sense in DTN and can improve the overall network performance metrics.

Literature review
During the last few years, there has been a tremendous growing interest in DTN. Most of related current research focuses on improving the DTNs performance (e.g. routing/forwarding, scheduling policy, buffer management, and mobility models). More precisely, these works mostly focus on reducing (1) the messages loss probability, (2) the end-to-end delay, (3) the expected number of transmissions, and (4) the energy consumption while maximizing the delivery rate of transmitted messages. However, most of these provided results are based on some strong assumptions and/or are sometimes difficult to apply to a realistic network.
In the meantime, numerous research works have been devoted to stimulate DTN relays to participate in the relaying of data. This research direction has gained considerable interest and has become a hot topic. Indeed, several incentive mechanisms have been designed to sustain cooperation among selfish relay nodes. Jiang and Bai 6 exhibited an interesting survey on incentive mechanisms for DTNs. Four schemes have been compared as follows: (1) virtual currency-based incentive mechanism, (2) credit-based incentive mechanism, (3) game-theory-based incentive mechanism, and (4) combined incentive mechanism. In the work by Hulke and Attar, 7 the authors discussed different game theoretical-inspired incentive mechanisms and analyzed them while pointing out their advantages and drawbacks. Chahin et al. 8 propose a fixed rewarding mechanism. To make tradeoff between incentives for cooperation and energy consumption, a Minority Game is constructed. Next, a learning algorithm is suggested to drive the system to a stable operating point (Nash equilibrium). In the work by El-Azouzi et al., 9 the authors use evolutionary game theory and a fixed rewarding mechanism to model the behavior of the relay nodes in a dense DTN environment. In the work by Brun et al., 10 the authors consider a simple reward scheme and provide some interesting insights on how the source node could optimize the reward value based on received relays information. This incentive mechanism depends on the time at which the source meeting occurs and on the information hidden/disclosed by the source (number of message copies/ previous meetings with opponent, etc.), and both static and dynamic policies have been investigated. Lu et al. 11 present how energy harvesting can be exploited to improve the performance of opportunistic forwarding in mobile DTNs. In the work by Niyato et al., 12 the authors discuss how the relay nodes behave and form coalitions to help each other to forward a given file while dealing with energy and buffer space constraints. Neglia and Zhang 13 propose an activation forwarding control at relay nodes, but their optimization supposes that relays are always available to cooperate with the source node. However, the relays cannot always be active due to limited energy or buffer space.
Another exciting research direction focuses on buffer management and content popularity. Yet, several studies mainly spotlight the buffer management, for example, Krifa et al. 14 propose an optimal buffer management policy under unlimited bandwidth, which is far from reality. Wang et al., 15 consider that the popularity of the content files is a crucial parameter to distribute them on relay's buffer. Next, they propose an algorithm that discovers a near-optimal allocation of files in the network. This work uses a minor unrealistic assumption that the files have the same size, and a major unrealistic assumption of a dense network is guaranteeing file transmission to the destination. In the work by Koulali et al., 16 the authors propose a strategic beaconing scheme for DTN and suggest an application for unmanned vehicle networks (UAVs). The main idea is to strategically switch to sleep mode when a contact opportunity may only occur with very low probability. This mechanism has been shown to meet a delivery-energy tradeoff. In the work by Le et al., 17 the authors propose a cooperative caching scheme based on the social relationship among nodes to improve the DTNs performance metrics. More precisely, this may help reduce the delay and the cost of message replicas. Yin and Cao 18 design a cooperative caching scheme for mobile ad hoc networks (MANETs), aiming to reduce the delay of future requests by caching the data at certain relay nodes. Numerous other works deal with the reduction of the data redundancy among neighboring nodes to create content diversity with a limited cache space. 19,20 However, these caching schemes are hard to implement due to the intrinsic challenging characteristics of DTNs.

Our contribution
In this work, we introduce a mechanism for file caching in DTNs through sustaining cooperation among relay nodes by offering them a reward. We evaluate the tradeoff between the reward value and the energy consumed to find out a rest/stable point for the distributed network. The mechanism consists of asking the relays to cache a file according to an incentive rewarding. More precisely, the source node is offering a reward to the first relay node for accepting to cache the file and succeeding to forward it to the final destination. However, the relay may either accept or decline the source offer depending on the reward attractiveness and on its battery status (energy cost). For computation tractability and without loss of generality, we consider the two-hop routing 21 to route the files to their destinations. Moreover, the choice of the two-hop routing is also motivated by an efficient resources management, that is, bandwidth, buffer space, and relaying energy. The source node attempts to transmit the file to any relay that it encounters, whereas the relay nodes are allowed to transmit the file only to the final destination.
Due to limited storage and battery constraints, relay nodes may behave in a selfish way, that is, they may not be willing to cooperate all the time. Hence, the interaction among the source node and the relay nodes is naturally modeled as a non-cooperative game. The dynamic interaction among relays and the source node (all assumed to be rational) pushes the system to converge to an equilibrium point. 22 Now, each player seeks independently to maximize its own utility function which is also depending on the other players strategy vector. It follows that Nash equilibrium (Nash equilibrium is a strategy profile where no player/agent has incentive to unilaterally deviate) as a concept solution is the natural solution concept for such an adversarial situation. We define the payoff of each player a function of the contact probability (with the source node), the delivery probability (to the destination node), the file lifetime, and the energy consumed. On one hand, the source sets a reward value and invites relay nodes to accept its deal. On the other hand, each relay has two actions (pure strategies), either to accept (strategy ''a'') or to reject (strategy ''r'') the source's offer. Next, we allow the relay nodes to mix their strategies according to some probability distribution (mixed strategy). In order to reach a delivery-energy tradeoff, we consider that only the first relay that encounters the destination within the file lifetime still receives the reward. Thus, the other nodes having accepted the deal will receive zero and incur an energy penalty. To sum up, the relay has a decision-making problem which consists of acting under constraint of its battery energy, the reward value, and the probability that another relay had delivered the transmitted file prior to it.
A preliminary version of this work 23 presents the caching problem for two relay nodes. This scheme is quite unrealistic but still provides some useful insights toward the derivation of the general model. The work presented by Ezzahidi et al. 24 provides a discussion on the reward mechanism for the n-person game but without detailed theoretic analysis and does not consider some important parameters. The major contributions of this article are fourfold and can be summarized as follows: Our scheme is realistic. Indeed, decision-making includes the contact probability, the file lifetime, the energy consumption, and the relay willingness to cooperate; We designed numerous fully distributed algorithms to reach the game equilibria: our algorithms cover both pure Nash equilibrium and mixed Nash equilibrium, for both discrete action sets (for relay nodes) and continuous action sets (for source node); Our proposed learning algorithms require only local information (no external information is required), and thus, our scheme has good scalability features. In other words, our framework is suitable for both sparse and dense/ultra-dense networks. It could also capture device-to-devicelike communications. We showed that the Nash equilibrium of this game has some good fairness features. Indeed, we showed that the Price of Anarchy (PoA) is bounded and could be efficiently controlled and make it as closer to 1 as one wish by fine-tuning the file lifetime or alternatively adapting the mobility parameters (speed, directions, stop points, etc.).
The rest of this article is organized as follows. In section ''System architecture and model formulation,'' we describe the problem, system architecture, and the problem formulation. We analyze the game equilibria for two-person case and then for the n-person case in sections ''Two-person caching game'' and ''The n-person caching game,'' respectively. Next, we describe three learning schemes in section ''Learning algorithms and insights for real-world implementation.'' Extensive numerical and simulation results are presented in section ''Numerical investigations.'' Many insightful concluding remarks and perspectives are drawn in section ''Conclusion.'' System architecture and model formulation We consider a DTN network with a single source, a single destination, and n relays which are involved in the transmission of files. The files are generated at the source and the relay nodes should forward them to their final destination. Each file has a utility time h (finite horizon) during which the destination is interested in such a content. We assume that the relays have sufficient buffer capacity to store the content file, and they are only authorized to forward it to the destination (essence idea of the two-hop routing). Moreover, we suppose that the contact time is quite sufficient to succeed the file-carrying transmission when the link between nodes is up and the inter-contact times between any pair of nodes are independent and identically distributed (i:i:d:) random variables. Furthermore, the source node rewards the relay nodes for agreeing to cache-and-forward the file. However, only the first relay that has accepted to cache the file and that has succeeded to deliver it to the target destination receives the reward (each file has its own reward). Of course, each participation in the caching game incurs a cost that can be seen as the energy consumed by reception, caching, and transmission operations. Due to mobility patterns, the contact among the nodes is intermittent. Several works (e.g. Gao et al. 25 ) showed that the intercontact time between a pair of nodes follows an exponential distribution with rate l. Let p c be the contact probability between two nodes within the file content lifetime h. It is written as Throughout this article we assume, without loss of generality and clearness, that every node has the same contact probability with the source node. Besides, as mentioned above, at each encounter with the source, the relay chooses either to accept or to reject a given file. Hence, the source shall set an attractive payment strategy to encourage the relays to cooperate. It advertises a positive reward (The earned reward would be a certain credit, an amount of bit-coins, or some reputation bonus that the source uses to send its own files over the DTN network) a ! 0 and promises it for the first relay node accepting to cache and forward successfully the file to the target destination. Figure 1 illustrates the interaction between the source node, relay nodes, and the destination node.

Source problem
The behavior of the source can be captured from a business perspective, that is, when it generates a file, it associates with it a value a 2 ½0, a max . The file value may reflect the importance of the file from the source perspective. Alternatively, this value may also be perceived as related to the freshness and the popularity of the content file. Due to the lack of simplicity, we consider that the reward to be advertised by the source node equals a. Thus, the source is seeking to minimize this value while ensuring that the relay nodes cooperate with it. When a source-relay contact takes place, if the relay accepts and successfully delivers the file to the destination within deadline t h, the relay receives a from the source node, and the source receives the surplus a max À a. However, when the relay accepts to cache the file but fails to deliver it to the target destination, the source node will receive penalty Àa max . The last case arises when no relay node accepts to cache the file, the source will be again charged Àa max . The last penalty aims to make incentive to guarantee a non-zero delivery rate.

Relay problem
When a file is generated by a source, a control admission takes place during the utility time h. Yet, the relay has two possible strategies, either to accept (action ''a'') or to reject (action ''r'') this file. From strategic reasoning, the relay chooses to accept or not depending on the value a and on the expected energy to be consumed during the relaying operation. Clearly, mutual strategies induce some payoff for both the relay node and the source node. More precisely, when a succeeded transmission occurs (i.e. the relay delivers the file to the destination within the file lifetime h), the relay receives a positive reward a (the file value). However, whenever the relay accepts to cache the file but fails to forward it, it will experience a constant regret b. We notice here that an energy cost s h , which helps to increase the network lifetime, applies whenever the relay decides to participate in the game. When the relay picks out strategy ''r,'' it will either bear a cost g if no contact with the destination occurs within h, or it will bear a cost Àa if it encounters the destination within h. The last penalty aims to sustain cooperation and then avoids alwaysreject strategy.
Remark 1.In a DTN environment, the contact opportunities are sparse. Then, each node (source, relay, and destination) must advertise periodic beacons to discover its neighboring nodes and then get aware about contact opportunities. In this article, we consider that this feature is used by every device, so it can be plausibly omitted, alternatively it can be added as a fixed additional energy cost. Thus, this will not change the obtained results since it only has a shifting effect.
Obviously, each transmission/reception attempt incurs an energy cost. We assume that each transmission consumes an amount of energy denoted by s t , whereas the relay node may experience much higher energy consumption. Yet, the relay node utilizes its energy budget to receive the file from the source node, to store the file till it meets destination or file deadline, and to send the content file to the final destination. Let s be the energy consumed during file storage per unit time. For simplicity and without loss of generality, we include the reception and the transmission energy costs into the forwarding energy denoted by s h . Thus, we have It has been shown, in the works by Ezzahidi et al. 24 and Altman, 26 that a single relay node delivers a file to destination within the file lifetime h with probability is the probability that a relay is not succeeding in file relaying to destination. Moreover, we assume that the relays have the same success probability. Thus, equation (2) becomes Now, we deal with the utility functions of both the source node and the relay node under the interaction details and the reward described above. Besides, it is rational that any action played by relay depends on the actions tacked by n À 1 opponent relays, and thus, the utilities are also functions of the number of relays in the network.
Let us denote P i succ (''a'' , z) the probability that a given relay i among z participants, played strategy ''a'' and succeeded to deliver a given file (i.e. the first to deliver the file to the destination). We have the following We denote by n a the number of relays that having accepted to cache the file out ofñ relays encountered by the source node. The mean number of nodes encountered by the source equals

Relay utility
The utility function of the relay is defined as the difference between the reward earned from the source node and the energy consumed during the cache-and-forward transaction. We denote by U i ( ''a'' , a) the utility function of the relay when it plays its pure strategy accept ''a, '' and U i (''r'', a) when it plays its pure strategy reject ''r''

Source utility
We define the utility function of the source node as the difference between the maximum value that it is willing to pay and the cost function (expected energy consumed). It is expressed by and can be rewritten as Now, we turn to study the node's behavior and characterize the equilibria structure of the caching game. Here, the source and the relay nodes are rational, that is, they know all system parameters mentioned above and each one seeks to maximize its own objective function. More precisely, we will exhibit some nice properties of the Nash equilibria in terms of existence and uniqueness under both pure strategies and mixed strategies. Under these assumptions, the induced caching game is clearly a one-shot (static game) non-cooperative game with complete information and the corresponding results. This formalism is suitable for our problem since it allows to clearly model the node's behavior at low complexity and acceptable computational tractability.
Existence of a Nash equilibrium for the game is given by the well-known Debreu-Fan-Glicksberg theorem. Yet, under continuity of each utility function U i (p i , p Ài ) in p i , the quasi-concavity of U i in p i , and the convex-compactness of strategy sets, the uniqueness of equilibrium is guaranteed. Moreover, it can be achieved with the correspondence of the best response.
Hereafter, we derive some closed form for the system equilibria. We first consider the simple case of two-person game. Next, we generalize our results to the n-person game.

Two-person caching game
In this section, we present our results and characterize the Nash equilibria under pure strategies and mixed strategies.

Pure Nash equilibrium
When nodes are only using pure strategies, we have the following result. Lemma 1. The strategy profile ( ''a'', 0) is a pure Nash equilibrium if and only if Proof. When the n relay plays the pure strategy ''a, '' we have which means that In this case, the payoff of the source is written as The source payoff is a linear function with a negative slope, and it reaches the maximum at a Ã = 0, with a 2 ½0, a max . Thus, by substituting a Ã = 0, equations (2) and (3) into equation (11), we obtain After some algebra, we obtain which completes the proof.
Lemma 2. The strategy profile ( ''r'', a max ) is a pure Nash equilibrium if and only if Proof. Let us assume now that the relay nodes play the pure strategy ''r'' (reject). We have the following Now, the payoff of the source node is Again, the payoff function of the source node is a linear function but with a positive slope this time. Then, it reaches the maximum at a Ã = a max . By substituting a Ã = a max , equations (2) and (3) into equation (15), we obtain À a max (1 À (1 + lh)e Àlh ) À g ! a max (1 À (1 + lh)e Àlh ) À b(1 + lh)e Àlh À s 1 À (1 + lh)e Àlh l ! After some algebra, we obtain which completes the proof. Pure Nash equilibrium is a natural solution concept. However, such a solution may not exist all the time, see existence conditions in Lemma 1 and Lemma 2. Moreover, a pure equilibrium could fail to achieve a certain lucidity between relay nodes and good fairness properties among the game players. In fact, the pure Nash equilibrium ( ''a'', 0) seems counter-intuitive since the relay cooperates though the value of reward is null. This can be explained by considering the content lifetime and the energy costs. When the value of h is quite high, the probability 1 À Q h that the relay will encounter the destination becomes high and then the relay node chooses often to cache the content. Here, a relay node will incur a high damage in case of declining to cache the content due to the fixed regret and the high encounter probability. For more energy efficiency and better fairness properties, we allow the game players to use mixed strategies. From now on, a Nash equilibrium point is a strategy profile where each relay accepts to cache the file with some probability, and none of them has incentive to unilaterally deviate.

Mixed Nash equilibrium
When mixed strategy is permitted, the relay node chooses its pure strategies according to some probability distribution. Now, it plays strategy ''a'' with probability p 2 ½0, 1 and plays strategy ''r'' with probability 1 À p.
Consider that the source node sets a file value a when the relay accepts the file with probability p. The relay will mix between its two pure strategies. At equilibrium, the expected utilities turn out to be equal, and the relay node becomes indifferent on which pure strategy to pick. Hence, the equilibrium will be given by solving the following equation (the indifference principle) which allows to compute the equilibrium content value to be set by the source node We turn now to derive the equilibrium forwarding strategy for relay nodes. The source problem can be resolved by maximizing its own objective function. At Nash equilibrium, the derivative with respect to a of its expected utility vanishes. Namely The expected payoff of the source node writes Thus, the mixed Nash equilibrium is given by Remark 2When the horizon (file lifetime) is very large, simple formulas for mixed Nash equilibrium (a Ã , p Ã ) could be found. Namely PoA In this subsection, we aim to quantify the performance degradation caused by the selfish behavior of the noncooperative relay nodes. The loss of efficiency due to decentralizing decision-making is often captured using the concept of PoA introduced by Koutsoupias and Papadimitriou. 27 It is defined as the worst-achievable ratio between the aggregate utilities of a Nash equilibrium (decentralized setting) and of the optimal solution (centralized setting). When the value of the PoA is close to 1, it signifies that the difference between a Nash equilibrium and the optimal solution is not significant and near-optimal performance can be met without an expensive centralized control. However, unfortunately, the PoA can be arbitrarily large which means that the decentralized scheme may lead to a huge counterperformance and high loss of efficiency. Since the Nash equilibrium is unique for our caching game, the PoA can be written as follows Let us compute U opt (p, a), the sum of the expected utility functions (social welfare), and derive its globally optimal solution. We have After some algebra, we obtain Lemma 3The globally optimal solution of the function U opt (p, a) occurs at (a opt , p opt ) such that Proof. It is quite straightforward by vanishing the firstorder derivative of U opt (p, a). Namely Thus Let us compute U eq (p Ã , a Ã ), the sum of the expected utility functions at Nash equilibrium. We have Substituting equations (25) and (26)

into equation (24) yields
We finally obtain The last result shows that the PoA of the caching game is bounded for content files with large lifetime. Yet, the PoA can be efficiently controlled by fine-tuning the source intrinsic parameters such as a max . Otherwise, it can also adjust the relay nodes mobility pattern, that is, l, to control the loss of efficiency due to decisionmaking decentralizing.

The n-person caching game
This section extends our results of the general case of n relays, a single source, and a single destination. We analyze the interaction/competition among relay nodes and derive some sufficient conditions for existence of the Nash equilibria.

Pure Nash equilibrium
Proof. In the case where all relay nodesñ play their strategy ''a, '' we write Since n a =ñ, the source's utility becomes U s (''a'', a) = ÀñP i succ (''a'' ,ñ)a Àña max (1 À 2P i succ (''a'',ñ)) Àñs t which is a linear function with a negative slope. Then, it reaches its maximum at a = 0.
Substituting a = 0 into equation (28) yields which completes the first part of the lemma. In the case where all relaysñ decline to cache the file, that is, they play strategy 00 r, 00 we have That is Since n a = 0, the source node's utility becomes U s (''a'', a) =ña Àña max Àñs t which is again a linear function with positive slope. Thus, it reaches its maximum at a = a max . Substituting a = a max into equation (31) yields The proof of the second part of the lemma is complete.
The next lemma provides sufficient conditions for existence of a pure Nash equilibrium where some relay nodes accept the source deal, whereas the remaining relay nodes decline the offer.
Lemma 5. The strategy profile (A n a , Rñ Àn a , 0), resp: (A n a , Rñ Àn a , a max ) ð Þ is a pure Nash equilibrium if n a ! lbñ l(2b + g) + s(1 À Q h ) resp: n a l(2a max + b)ñ l(2b + 2a max À g) + s(1 À Q h ) Proof. Let us denote by A the group of relay nodes that have accepted to cache-and-forward the file, with cardinality A j j= n a . The group of relay nodes that have declined the source offer is denoted by R, with cardinality R j j=ñ À n a . We have the following for cooperating relay nodes Similarly, we have the following for defecting relay nodes From equations (33) and (34), we strategy profile (A, R, a) or equivalently (n a ,ñ À n, a) is a pure Nash equilibrium if That is The source utility writes which is a linear function; hence, it can reach its maximum: 1. For a = 0, if its slope is negative, that is n À n a (P i succ (''a'' , n a ) + 1) 0 ) n a !ñ (P i succ (''a'', n a ) + 1) Substituting a = 0 into equation (36) yields which completes the proof of the first condition.
2. For a = a max , if its slope is positive, that is n À n a (P i succ (''a'', n a ) + 1) ! 0 ) n a ñ (P i succ (''a'', n a ) + 1) Substituting a = a max into equation (36) yields a max P i succ (''a'' , n a ) À b(1 À P i succ (''a'', n a )) À s h = À a max P i succ (''a'', n a ) À g which completes the proof of the second condition.

Mixed Nash equilibrium
Now, as mixed strategies are allowed, the source/relay problems are as follows: Source problem. The source node seeks to set the best reward value a BR (p), while relay nodes accept to cache the file with some probability p.
Relay problem. Given a reward a 2 ½0, a max offered by the source node, each relay seeks to compute the cooperating probability p BR (a) that maximizes its own objective function. When a mixed Nash equilibrium is achieved, each relay node becomes indifferent about which strategy to choose. Namely Then a Ã P i succ (p Ã , n) À b(1 À P i succ (p Ã , n)) À s h = À a Ã P i succ (p Ã , n) À g where P i succ p, n ð Þ= Z X n j = 1 n À 1 with Z = p(1 À Q h ). Next, we solve equation (38) and obtain Now, the expected utility of the source node becomes U s (p, a) = X n n a = 0 n n a p n a (1 À p) nÀn a U s (''a'', a) = X n n a = 0 n n n a p n a (1 À p) nÀn a ((n À n a (P i succ (''a'', n a ) + 1))a À a max (n À 2n a P i succ (''a'' , n a )) À ns t ) = (n À 1 + X n n a = 0 n n n a (pQ h ) n a (1 À p) nÀn a À X n n a = 0 n n n a p n a (1 À p) nÀn a n a )a À a max (n À 2 + 2 X n n a = 0 n n a (pQ h ) n a (1 À p) nÀn a ) À ns t At Nash equilibrium, the first-order derivative of the source utility vanishes. Namely Notice that the left side of previous equation (42) is continuous, strictly decreasing monotone function on ½0, 1 with ∂U s (0, a Ã )=∂a Ã = n.0 and ∂U s (1, a Ã )=∂a Ã = Q n h À 1\0; then, ∂U s (p Ã , a)=∂a = 0 has an unique solution p Ã .
Summary. The caching game has a unique mixed Nash equilibrium fully characterized by

PoA
The caching game has a unique Nash equilibrium, and thus, the PoA is written as where W (p, a) is the sum of the expected utility functions (social welfare) which are defined by The pair (p o , a o ) denotes the maximizer of social welfare function.

Learning algorithms and insights for realworld implementation
We exhibit here some discussions and algorithms on the implementation of our incentive mechanism in realworld networks. The proposed reward mechanism can be efficiently and simply implemented through distributed learning algorithm. Obviously, due to the lack of the acknowledgement (ACK) about the state of the transmission (success or failure), which may incur into large delays, we propose distributed learning algorithms for the relays and the source to allow them to learn their individual best decisions only based on information available locally and independently over time, that is, they use the local observations to estimate their payoffs. But, once a successful delivery occurs, each node receives its real reward allowing to correct the learning trajectory.

Learning scheme for the source node
The source chooses a real value a at each instant k depending on the current strategy probability distribution of relays, and it has a problem to provide the optimal value of a that maximizes its payoff. Then, it picks this value continually and independently from interval ½0, a max based on its local observations and its estimate to its own payoff at each learning step. In order to solve the source optimization problem, we suggest a stochastic approximation scheme. For instance, we consider the Kiefer-Wolfowitz scheme, the reader is invited to check Kiefer and Wolfowitz 28 and Thathachar and Sastry 29 for more details.

Learning schemes for the relay node
The relay's problem consists of choosing their best response strategy (to face the source policy, but also to face the other relays policies) that allows them to maximize their payoffs over time. At any time slot, every relay node selects a pure strategy according to a probability distribution. In order to discover pure Nash equilibrium, we consider the well-known Linear Reward-Inaction (LRI) algorithm. The LRI scheme only requires a slight amount of local information. For instance, it only requires the previous value of the distribution probability, the actual realization of selected pure strategy, and the reward received at previous step. Since LRI can only discover pure equilibria, we suggest a modified version called Linear Reward 2-Penalty (L RÀ2P ) algorithm to reach mixed Nash equilibrium. 29 Source node: learning continuous action In previous sections ''Two-person caching game'' and ''The n-person caching game,'' we showed that from a designer perspective, the source objective function U s has a unique maximizer at a Ã . In a realistic network, the source node is neither aware of its objective function nor of the structure of the Nash equilibrium. However, the source node receives periodically a realization of its actual reward function as numeric value. Latter realization is a local observationÛ s (â) that may help the source node to construct its own utility function and thereby learn its equilibrium strategy. The Kiefer-Wolfowitz stochastic approximation algorithm uses the instantaneous estimate of the gradient to adjust/update the learning scheme. Thus, whenever the learning trajectory goes in the right direction, the gradient keeps being positive. Otherwise, the gradient becomes negative and the dynamics bring the trajectory to the right direction. Convergence is guaranteed when the gradient vanishes. The update rule is given by the followinĝ where the estimated gradient is given by If the step sizes m k and c k satisfy the following P where 0\r\'; hence, the algorithm converges to local maximum of U s under some conditions on the function U s and the variance of the observationsÛ (â). 28 The usual choices for step sizes are c k = c 1 =k g , m k = m 1 =k with 0\g\1=2, m 1 .0; c 1 .0.

Relay node: learning pure Nash equilibrium
We use the LRI algorithm for the relays to learn their respective best response strategy (pure strategy only). Every relay updates a probability distribution independently over time with a view to converge to the best strategy progressively. Initially, the relay chooses strategy based on the initial probability distribution; after each time instant, the algorithm increases the probability to pick strategies with high observed utility and decreases the probability to choose the other strategies. Let p k = (p k , 1 À p k ) be the probability distribution at instant k, where p k denotes the probability to accept the source deal at time instant k. The update rule is written as where e k is the indicator function which equals 1 when the action ''a'' (accept the source deal) is selected at time instant k, otherwise it equals 0. And 0\h k \1 is the learning speed at time instant k. One can easily check that these dynamics converge to pure Nash equilibria when it exists. However, unfortunately, it may converge to pure profiles that are not equilibria. This is why the initialization phase is crucial for such a learning scheme.
Relay node: learning mixed Nash equilibrium Ultimately, the derived mixed Nash equilibrium exhibits some nice features. In particular, it has better energy efficiency and better fairness properties since each relay node may participate in the caching game.
To guarantee convergence to such an equilibrium, we propose the L RÀ2P algorithm to drive the relay nodes to their respective equilibrium acceptance probabilities p Ã = (p Ã 1 , p Ã 2 , . . . , p Ã n ). Practically, every relay node updates its respective distribution probability, independently of its opponents, over time according to following rule where h 2 = e Á h 1 , e\1 is a small real number, and m = 2 is the number of relay strategy (either ''a'' or ''r''). At each iteration, the relay randomly chooses a strategy according to its own distribution probability and updates the strategy probability vector at instant k until convergence, that is,until achieving p Ã 2 arg max p U r (a Ã , p):

Numerical investigations
In this section, we present some numerical results and simulation runs to illustrate the behavior of our algorithms. Unless contrary indication, we use the following setting: a max = 3, b = 0:04, s = 0:25, s t = 0:02, and g = 0:05. Next, we plot the performance metrics of the caching game Nash equilibria while varying the horizon h (file lifetime) and the mobility parameter l quantifying the contact rate. The considered metrics are the caching acceptance probability p Ã , the equilibrium reward value a Ã , the utility functions, and the PoA. Figure 2(a) and (b) depicts the probability of acceptance p Ã and the file value a Ã at Nash equilibria as function of the file lifetime h. We note that the file value (reward given to the relay node) increases as the horizon increases. The relay node meanwhile tends to decrease its willingness to cache the file as the value of a increases. This behavior can be explained as follows. On one hand, as the file lifetime increases, the relay nodes have less incentive to accept the caching offer since it is consuming too much storing energy. On the other hand, the source node seeks to convince the relay to cache the file by offering better deals with higher reward value. Moreover, we notice that the source has incentive to offer less reward as the contact rate increases, which is quite intuitive since the caching transaction would not consume too much energy, whereas the relay nodes seem to behave a bit in a counter-intuitive way. Indeed, the relays have incentive to accept caching the file as the probability to contact the destination is low. This can be easily captured by the increasing reward offered by the source node and the low energy investment as the contact probability decreases.
We plot in Figure 3(a) and (b) the equilibrium payoff functions of the source and the relay (respectively) over file lifetime h. For low values of h, the source utility increases until a given threshold h th = h(l) and converges to the maximum payoff value U Ã s = lim h!' U s (a Ã , p Ã ) = À 9. We remark that the threshold horizon h th increases as the inter-contact rate l decreases. As for a relay node, we notice that its utility is strictly decreasing as h goes to infinity. A special feature is that for both nodes, it is more interesting to consider long enough file lifetime with some reasonable contact probability. Figure 4(a) and (b) shows the impact of the network density (the number of relay nodes) on the equilibrium probability of acceptance and file value. We notice that p Ã and a Ã both increase as the number of relay nodes increase. When the network becomes dense/ultra-dense, the competition among relay nodes is getting fierce. So, the probability to be the first to deliver the file content is getting smaller and smaller. This may push relay nodes to often decline the source deal, so the source node has interest in increasing the reward it offers. This improves again the acceptance probability. We find out again that the relays are pretty interested in the file with low lifetime.
Next, we depict in Figure 5(a) and (b) the impact of the network density on the reward value and the probability of acceptance for different contact rates (l). Now, the source node is offering higher reward for low value of l, which is quite intuitive. This is indeed an incentive action to ensure an acceptable delivery probability though the network topology is not changing enough to create contact opportunities. Moreover, the value of reward increases until a given threshold n th = n(l) and becomes a max = 3. It is definitely a surprising feature that this saturation threshold increases as the l increases. This means that the source node still can manage to pay less as the network size is high, taking benefit from relay nodes competition.  Now, we focus on the loss of efficiency experienced at Nash equilibrium. We depict in Figure 6 the PoA while changing the file lifetime h for different values of contact rate l. We note that the PoA is very close to 1 for average and high values of l, which means that the equilibrium performs as well as the social optimum. However, we remark that the PoA is bounded for low values of l, and the loss of efficiency is not very significant which means that our scheme drives the system to even a near-optimal point. One notices also that increasing the file lifetime h may increase the equilibrium efficiency. Hence, a good efficiency could be met using some cross-layer optimization including the protocol stacks and the network topology. For instance, one can adjust the file lifetime (application layer) and the mobility parameters (physical layer and network topology).
The remaining is devoted to illustrate the behavior of our learning algorithms discussed above in section ''Learning algorithms and insights for real-world implementation.'' We consider h = 100, l = 0:5, and would like to mention that at least 10 runs have been averaged to produce the learning trajectories. Figure 7(a) and (b) shows a setup where the pure equilibrium (A n a , R 0 , 0) is discovered. Next, Figure 8(a) and (b) indicates the convergence to the pure equilibrium (A 0 , R k , a max ). We note that the probability of acceptance of five relays converges to the same probability either 0 or 1 according to the considered setting. A convergence illustration of the   mixed equilibrium (p Ã , a Ã ) is depicted in Figure 9(a) and (b). The three algorithms exhibit impressive properties in terms of convergence speed and convergence accuracy. However, a theoretic convergence analysis is still worth consideration and is crucial to properly address all convergence-related issues. This would in particular optimize implementation efforts and make such an incentive mechanism more efficient.

Conclusion
In this work, we propose a reward-based incentive mechanism for file caching for DTN environment under constraints of energy consumption, file lifetime, and rate of contact. The source-relay interaction is captured using a non-cooperative game. Next, we give some necessary conditions for the existence of pure Nash equilibria. Then, we explicitly derive the unique mixed equilibrium of this caching game which seems to have some very nice features. For instance, it exhibits better fairness properties and improved energy efficiency in comparison with pure equilibria. Furthermore, we propose three fully distributed algorithms to discover the pure equilibrium and the mixed equilibrium as well. The first scheme allows the source node to learn its equilibrium strategy (continuous action). The second scheme is used by relay nodes to learn their respective pure strategies (discrete strategy). Whereas the third scheme is designed to explore the relay nodes' mixed strategies at equilibrium.
As a part of our future works, we aim to implement such a mechanism in a real-world network using DTN framework. We are also interested in building a more energy-efficient mechanism by allowing relay nodes to switch to sleep mode to save energy while allowing a delivery probability higher than some given threshold.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.