Improved Message Diffusion Model for Node Coverage Problem of Ad Hoc Network Based on Node Visit Times

It is of great significance to conduct researches on the message diffusion process for the node coverage problem, which can be generally abstracted as a random sampling model in the cooperative communication systems including the ad hoc and unstructured P2P networks. However, the message diffusion in the ad hoc network is not a completely independent random process. When forwarding the messages, the nodes will be influenced by such factors as degrees, visit times, and network connectivity. But the random sampling model does not take these factors into consideration, resulting in the overestimated node coverage degree. Discussing the message diffusion process of the cooperative communication systems like ad hoc network, this paper analyzes the causes of the inaccuracy problems of random sampling model and solves the problems by specially introducing the factors such as node degree and visit times. As for the E-R random network topology, it validates the effectiveness of the model proposed herein in contrast with the simulation experiment results. Compared with the random sampling model, the model proposed herein coincides better with the simulation results of the ad hoc network message diffusion process in the condition of network connectivity and its accuracy can meet the requirements for 3 and 5 visits.


Introduction
The nodes of ad hoc and unstructured P2P networks which are the autonomous, peer-to-peer, multihop systems need to forward messages from other nodes as well as their own messages to realize the resource sharing and cooperative communication.The searching and positioning of nodes and resources in the ad hoc network are completed by transmitting the search message within the network.Therefore, the research of message diffusion model plays an important role in the cooperative communication systems like ad hoc network [1].Also in the social network, the message diffusion model can be used to analyze the transmission process of gossip messages by means of the social relationships of the crowd.
The main objective of analyzing the message diffusion model in the cooperative communication systems is generally to solve the problem of how many messages should be forwarded to reach the specific number of all the nodes or the specific number of messages which are given to confirm the ratio of nodes receiving the messages.In general, it is desperately expected to cover as many nodes as possible with few messages, which is the typical node coverage problem [2].
There are similar researches on node coverage conducting the keyword research through message diffusion in the unstructured P2P network [2,3].Although the distances of nodes in the physical networks are quite long, the specific overlay network topology can be built to achieve the link in logics to directly transmit the messages to the remote nodes to improve the efficiency of message diffusion.Take the small world network for instance; its topological structure is just between the regular network and the random network and is characterized by the small network diameter and high node aggregation, resulting in the fast speed of message transmission.Because the message diffusion process of unstructured P2P network is not limited by the physical distance, it is significant to analyze the effects of topological features of the coverage network on the efficiency and costs of message transmission to study the node coverage problems.
But in ad hoc network, the nodes can dynamically join and leave, and sometimes they even can be movable, just like International Journal of Distributed Sensor Networks in the vehicular ad hoc network and mobile sensor network; therefore, the topological structure of the whole network is dynamic and random.And the computing and storage capacity of ad hoc nodes are too weak to receive the whole topological structure and the distribution of other nodes and resources [4], let alone building the overlay network.In addition, the communication capability of ad hoc nodes is limited and they are incapable of remote communication.In this condition, the message diffusion process is limited by the physical distance, so the message can just be transmitted to as many nodes as possible through the neighbor nodes.During this process, response time and message overhead are taken into consideration from the perspective of system performance.Methods including flooding [5,6], random walk [7][8][9][10][11][12][13], and multiple random walks [14,15] can be adopted.The flooding model is fast but its message overhead is huge.The major methods, random walk and multiple random walks, show great performance in node coverage and message overhead, but their response speeds are low.
Generally speaking, when random walk is used in transmitted messages in the ad hoc network, the ultimate state of node coverage, if the number of nodes in the network is large, can be analyzed by the random sampling model including random pick [16,17] and coupon collector's problem [18,19].But forwarding messages in the method of random walk or multiple random walks is not completely equal to "independent random sampling." In fact, the forwarding of message is usually connected with the current node state: the probability of visiting a new node in the way of message forwarding is related to the unvisited nodes in the current network as well as the current state of the forwarding nodes.But the independent random sampling process will not take the latter into consideration; thus, when the node coverage rate at a certain time point is analyzed, the estimated value may be high.
This paper makes use of node degree and visit times to solve the problems existing in the random sampling model.The simulation experiment results show that the message diffusion model proposed herein of ad hoc network is in accordance with the reality in the condition of high node coverage.
Section 2 discusses the current problems of the random sample model.Focusing on the node coverage in the message diffusion process, Section 3, based on the normal model, explores the effects of node degree and visit times to obtain the model proposed herein.Validating the effectiveness of that model through simulation experiments, Section 4 studies the identical degrees of the theoretical values and the simulation experiment results in the networks of different scales and node degrees.Section 5 makes the conclusion.

Random Sampling and Its Shortcomings
At present, the random walk and multiple random walks are two major methods in the research of the cooperative communication networks like ad hoc network.Both methods assume that nodes take the paths different from the message receiving paths to forward messages and messages randomly picked from all the paths (but for the coming paths) to leave the nodes.The only difference between them is that the multiple random walks have multiple paths for messages to leave.In addition to this, there is no essential difference.When the number of nodes is large enough in the network, the limit of node coverage can be analyzed by the random pick [16,17] and coupon collector's problem [18,19].Node coverage herein refers to the rate of nodes reached by the messages in the network.

Random Pick
Model.Random pick model can be generally described as follows: there are  balls in the box and one ball will be taken out and then returned to the box at a time.After  times, will  balls be taken out without repetition?The general result of this question [16,17]: in the initial condition of  = 0 and  = 0,  can be described as the function equation of : One typical application of analyzing the node coverage with the random pick model is to estimate the success rates of the search of nodes and resources: if  target resources are randomly distributed in the ad hoc network, the probability of finding a target resource after sending  messages, that is, the success rate of resource search , can be described as the function equation of : 2.2.Coupon Collector's Problem.Coupon collector's problem is another typical random sampling process [19], which can be generally described as follows: if there are  kinds of coupon and each kind of coupon has an equal chance to be taken out, the  times will be needed to collect  kinds of coupon on condition that one coupon will be taken out at a time.
As we all know, , number of times to collect  kinds of coupon, is the function equation of : where   is the harmonic series of , and if  → ∞, then Here  is Euler-Mascheroni constant and  ≈ 0.5772156649.Experiment number to collect  kinds of coupon, , is the function equation of : When  → ∞ and  → ∞, we can neglect the latter 2 items; then If the question becomes, after taking  coupons, about the number of different coupons  that will be taken out, this is equivalent to the inverse problem of couple collecting model.Then solve the inverse function of (6) to obtain function relationship between  and : It is obvious that (7) and ( 1) are just the same, which means that the random pick model in the initial conditions of  = 0 and  = 0 and coupon collecting model of the sufficient sampling space  and  are a pair of equivalent inverse problem.

Problems of Random Sampling
Model.This shows that the limit of node coverage can be quantitatively analyzed by random sampling models including random pick model and coupon collector's problem if there are numerous nodes in the ad hoc network ( → ∞) and the messages transmitted cover most of the nodes ( → ∞).But neither of these 2 models can exactly describe the node coverage at a time point.Random sampling model is a limit model instead of a process description model, which is the first problem we may encounter in the analysis process of ad hoc network node coverage.Moreover, the node coverage of ad hoc network will change in the message diffusion process.This message diffusion process is not a completely independent random process, so nodes, when forwarding the messages, will be influenced by node degree and network connectivity.The probability of forwarding the messages to the new nodes will depend on the undiscovered nodes in the network as well as the current state of the forwarding node.However, random sampling model just assumes that these two samplings are independently random, without taking into consideration the probability that the current nodes may have been visited for many times.As a result, the estimated value may be high if the random sampling model is used to estimate the ad hoc network node coverage at a certain time point.

Improved Ad Hoc Message Diffusion Model Based on Visit Times
When analyzing the node coverage process of cooperative communication systems like ad hoc network, the random sampling model does not take the current node state into consideration; in order to solve this problem, this paper introduces the factors such as node degree and visit times.First, presenting a general algebraic model to conduct the quantitative analysis on the node coverage of ad hoc network message transmission, this part analyzes the effects of node degree and visit times on the node coverage to propose the modified model herein.

General Algebraic Model.
Suppose that ad hoc network topology is in compliance with E-R random model [20], as shown in (, ); there are  nodes in the graph and the link probability between 2 random nodes is  (0 <  < 1), so the average degree of the node is  ⋅ ( − 1).To make it simple, we assume that the node degree, , is  ⋅ ( − 1).Imagine that the graphs are connected and nodes take the paths different from the message receiving paths to forward messages, which means the messages randomly picked from all the paths (but for the coming paths) to leave the nodes.At a time point of the message diffusion process,  refers to the number of messages forwarded and  refers to the number of nodes that have received the message.As time goes on,  will enlarge with the increase of  and gradually tend to approach .So  can be described as the function ().At this time, if the node forwards the message to the neighboring nodes, then the probability that the message reaches a new node is (−)/(− 2).After the message is forwarded successfully, the number of nodes which can be covered by the message is  + ( − )/( − 2).When  is large enough,  − 2 can be replaced by : If  → ∞, then The solution of the equation is is a constant dependent on the initial conditions which also determined its value.Generally, when  = 0, () = 0; then we can get 3.2.Parameter Modification.General algebraic model can conduct quantitative analysis on the node coverage at a certain time point in the message forwarding process of cooperative communication systems like ad hoc network.But it does not consider the node degree, resulting in the overestimated value of node coverage.The general algebraic model assumes that, with  messages having been transmitted to  nodes, the probability of message transmitted to new nodes, according to (8), equals the ratio of the number of unvisited nodes and the total number of nodes, that is, ( − ())/.
The assumption is available when the current node is the first node that the message passes by and the node randomly picks a link to forward the message.In this case, the rest of the links are new.So the probability of the message reaching a new node through any link is dependent on the proportion of new nodes in the network.However, the probability that the message will forwarded by the current node to a new node decreases if the message has visited the node.The reason is that the message may take the former visited link to reach the visited nodes with repetition.So the probability of the message forwarded to new nodes is related to the current node degree.When the message visits the node again, the higher the node degree, the higher the probability that the message forwarded through unvisited links, and finally the probability of reaching new nodes will be close to the probability of the unvisited nodes in the network, that is, ( − ())/; conversely, the probability of reaching new nodes is lower.In some extreme cases, the parameters    and    are undetermined, and they, according to definitions, are all connected with node visit times.The calculation processes of these two parameters are as follows.
This paper changes (8): if  messages are forwarded to  nodes, the probability that the messages forwarded to new nodes through unvisited links will be (), and (8) will be changed as follows: In the network of  nodes, if  messages are forwarded to  nodes, then the number of visit of every node is different from each other.Just as mentioned before, if the node degree is just (−1), the probability of unvisited links of every node is different.The more the visit times, the smaller the probability of unvisited links of the node.(), the probability of unvisited links in the network, is related to 2 factors: the probability that any node  is visited for  times and the probability of unvisited links of every node in this condition.So () is the weighted average of visit times of all the nodes and the probability of the unvisited links of nodes in the network.Before presenting the computing method of (), we firstly give the symbol definitions: (1) Probability of Nodes Visited for Multiple Times.If  nodes are covered after  messages are forwarded, then the probability of any node  visited for  times is (2) Probability That There Are Unvisited Links after the Nodes That Have Been Visited for Multiple Times.It is necessary to consider multiple conditions of visit times  to calculate the probability of unvisited links after the nodes that have been visited for multiple times    ; the following examples give the specific analysis processes, as shown in Figure 1 ( = 8).
(i)  = 1.The message first visits one node, which means that the visit time of that node  is 1; at that time, the initial state of the node is stated as shown in Figure 1(), which satisfies Because all the links are new ( 1  = 0), the message can pick any one link to reach another new neighboring node after the message reaches the current node; then The node state changes to state , shown in Figure 1(), and the number of visited links is 2, which means that the message takes 2 different links to reach and leave the node.Because of the only state to transfer the path, after the first visit of the node, the probability that the number of visited links changes from 0 to 2 is (ii)  = 2.If the message secondly visits the node ( = 2), shown in Figure 1(), there may only be 2 visited links ( 2  = 2) for the current node: At this time, the node randomly picks one link to forward the message which can only take the left  −  2  unvisited links to reach the new node: After the message leaves the node again, its link state will change to the third condition in Figure 1(, ( 1 - 3 )); its transition condition and probability are as follows.
(a)  →  1 (old in old out).The message visits the node through the visited link twice and leaves the node through another visited link; thus, the number of visited link is 2. Then the conditional probability that the number of visited link is still 2 after the node has been visited twice is (b)  →  2 (old in new out and new in old out).The message visits the node through the visited link for twice and leaves the node through an unvisited link, or the message visits the node through the unvisited link for twice and leaves the node through a visited link.Then the conditional probability that the number of visited link changes from 2 to 3 is (c)  →  3 (new in new out).The message visits the node through an unvisited link for twice and leaves the node through an unvisited link; thus, the number of visited link is 4. Then the conditional probability that the number of visited link changes from 2 to 4 is (iii)  = 3.If the message visits the node thirdly ( = 3), the initial state of the node may have 3 conditions, shown in Figure 1(, ( 1 - 3 )), and the numbers of visited links are, respectively, 2, 3, and 4. Because all these states are transferred from state , the probabilities that there are 2, 3 of 4 visited links are, respectively, And the mathematical expectation of the visited link number of the node is The probability that the node takes an unvisited link to forward the message is After the message leaves the node thirdly, the node state will change from state  1 - 3 to state ( 1 - 6 ), shown in Figure 1(); the state transition condition and probability are, respectively, (iv)  = 4.If the message visits the node fourthly ( = 4), the initial state of the node may have 5 conditions, shown in Figure 1(, ( 1 - 5 )) and the numbers of visited links are, respectively, 2, 3, 4, 5, and 6.Because all these states are transferred from state  1 - 3 , the probabilities that there are  ( = 2, 3, 4, 5, 6) visited links are, respectively, The mathematical expectation of visited link number of the node is The probability that the node forwards the message through an unvisited link is After the message leaves the node fourthly, the conditional probability of the message state transition ( = 2, 3, 4, 5, 6) is International Journal of Distributed Sensor Networks  4) Figure 1: Effects of visit times on the old and new states of node link.
(iv)  > 4. Generally, if the message visits the node for the  times ( > 4), the value range of , the number of old node link, is  ∈ [2, 2 − 2].The probability that there are  visited links for the node is shown in (31).The mathematical expectation of visited link number of the node is shown in The probability that the node takes an unvisited link to forward the message is shown in After the message leaves the node, the conditional probability of the transition of its old and new states is shown in (34)

Message Diffusion Model
Based on Visit Times.According to Sections 3.2 and 3.3, integrate ( 12), ( 13), (14), and (33) to obtain the improved model herein: Different from the general algebraic model shown in (8), this modified model, when forwarding messages, considers the effects of the current state of node, namely, the visit times of the nodes, on the forwarding probability in the next step.In some extreme conditions, all the links of the current node are visited, so no unvisited link will be reached in the next step.However, in the random pick or coupon collector's problem, every pick of link is independent, not considering the node state.So this modified model will more accurately describe ad hoc network message diffusion process, especially when the node coverage is high.

Experimental Environment and Parameter Setting.
To validate the ability of this model proposed herein to describe the message diffusion process of ad hoc network, this paper compares the theoretical value of quantitative analysis and the results of simulation experiments to test the deviation.Before presenting the specific experimental results and analyses, it defines the determination processes of simulation experiment environment and parameters.
(1) Experimental Environment.Focusing on the problems of random sampling model in the process of describing the message transmission of ad hoc network, this paper proposes the factors like node degree and visit times for improvement.In order to verify the improved results, it calculates the quantitative analysis results of random sampling model and the model proposed herein in different parameters in Matlab 7.12.Coupon collector's problem of the sufficient sampling space  and  is equal to the random pick model; therefore, this paper takes theoretical value of coupon collector's problem for analysis.
Both coupon collector's problem and random pick model present the theoretical calculation results, so this paper, based on NS-2 V2.29 network simulation platform, simulates the message diffusion process of ad hoc network.To simulate the ad hoc network of different scales, , the node number in the network, is set as 1000, 2000, and 3000 and the network topology is in compliance with E-R random graph model [20].This paper emphasizes the message diffusion process, so the simulation does not consider the delay and packet loss and each message data packet is 1 Byte.When the experiment starts, one node is randomly selected from  nodes as the message source and transmits the message to other nodes.All the nodes that receive the message will forward the message again at the same time slot and the experiment ends when the message has been forwarded for 3 times.Calculate the   According to (14), in the network of  nodes, if  messages are forwarded and cover  nodes, which means the node coverage is /, then    , the probability that any node  has been visited for  times, will change with the node coverage, just as Figure 2 shows ( = 2000).
We can see that most nodes have been visited for only 1-2 times in the condition of low node coverage (/ < 0.5).When the coverage is 0.5, the probability that the nodes are visited for once ( = 1) or twice ( = 2) is 50.0%or 34.7%.With the node coverage's (0.5 < / < 0.9) increase, the nodes will be multiply visited.When the node coverage reaches 0.9, the probability that the nodes have been visited for multiple times (2 ⩽  ⩽ 5) is, respectively, 23.0% ( = 2), 26.5% ( = 3), 20.4% ( = 4), and 11.7% ( = 5).But from the curve change of " ⩽ 5" in Figure 2, the probability that the nodes have been visited for more than 5 times is low in the node coverage conditions (/ < 0.9).Even when the coverage is 0.9, the probability that the nodes have been visited for no more than 5 times still reaches 91.6%.Only when the coverage is higher than 0.9, the probability that the nodes have been visited for more than 5 times increases.But there are still 51.2% nodes of less than 5 visit times even when the node coverage reaches 0.99.So the simulation experiment does not consider the condition of more than 5 visit times and the visit times are, respectively, set as 1, 3, and 5.
(3) Node Degree-.As mentioned in Section 3.1, this paper assumes that the ad hoc network topology is in compliance with E-R random model; as (, ) shows,  is the number of nodes and  (0 <  < 1) is the probability that there is a link between any 2 nodes, so the average degree of the node is  =  ⋅ ( − 1).To make it simple, we assume that the average degree of node is  ⋅ ( − 1).To research the message diffusion process of the network, we need to make sure that the network topology shall be connected; otherwise, some nodes will never be reached.
In consequence, the node number  and node link probability  of the ad hoc network must satisfy the condition  ⋅  > ln  in the simulation experiment.Then  As (36) shows, in order to make all the value of  meet the requirements in different combinations of  and , we assume  > ln  and  > ln  −  are applicable in any condition.According to the curve of ln  in Figure 3, only when the value of  falls in the dark side,  > ln .In fact, if  < 3000, then ln  < 8.01 and  equals 8 to satisfy the requirements.Only if 3000 <  < 5000, then 8.01 < ln  < 8.52 and  is more than 8.Although the number of node is 3000-5000 in the network,  is set as 8 and 10 in this simulation experiment to illustrate the effects of this proposed model.-axis represents : the total number of messages forwarded as the simulation experiment continues, and axis represents /: the node coverage.As mentioned, the experiment ends after the number of messages forwarded in the network reaches 3.Take the average value of ten simulation results as the final result.

Results and Comparison
From Figures 4(a)-4(c), we may find that, random sampling model taking coupon collector's problem as representative will overestimate the node coverage in any conditions.When node visit number equals 1 ( = 1), this model will underestimate the node coverage.When  equals 1, we just need to consider the condition that the message first visits the node.In fact, with messages being forwarded, the nodes will be visited by the messages for multiple times.So the parameter setting of  = 1 does not comply with the actual condition.From the figure, we can see that, as the experiment continues, the deviation between the curve  = 1 and the simulation result curve tend to be larger and larger.In contrast, the curves of  = 3 and  = 5 are closer to the simulation result curve, especially in Figure 4(a) ( = 1000).
In addition, in Figures 4(a)-4(c), the deviations of the simulation experiment results and the theoretical results of  = 3 and  = 5 become increasingly large.The deviation is the smallest and the simulation results and theoretical values are closet in Figure 4(a).In Figure 4(c), the deviation is the largest, which means that the simulation result cannot validate the analytical result of this model.The parameter vale of node degree  causes the problem.
According to the analysis of Section 4.1, to ensure the link of the experimental network topology, , the average node degree, must satisfy the requirement of  > ln .In the experiment of Figure 4,  is 8.In Figure 4(c),  = 3000 and ln  = 8.01.According to the parameter setting of , the network topology may be connected or unconnected.The simulation experiment result is the average value of ten experiment results; while the topologies are not connected in some experiments; some nodes cannot be connected and the node coverage is comparatively low; thus, the large deviation arises.So we repeat the abovementioned experiments under the parameter conditions of  = 10 and  = 1000, 2000 and 3000, and the results are shown in Figures 5(a)-5(c).
Also from Figures 5(a)-5(c), we may see that the random sampling model overestimates the node coverage in any case.If  equals 1, the proposed model will underestimate the node coverage.Compared with Figure 4, Figure 5 shows a small deviation with no problems of Figure 4(c).This means that, in the research process of ad hoc network message diffusion, our model, taking the node visit times and node degree into consideration, is more modified than the random sampling model in the condition of network connectivity; compared with the random sampling model, the result is closer to the real result with sufficient accuracy.

Conclusion
In the cooperative communication systems like ad hoc network or the unstructured P2P network, the message diffusion process can be abstracted as the random sampling model.But message diffusion process is not a completely independent random process; nodes, when forwarding the messages, will be influenced by node degree and network connectivity.However, the random sampling model does not take into consideration the condition that the current node may have been visited for times.So the estimated value may be high if the random sampling model is used to estimate the ad hoc network node coverage at a certain time point.With the increase of node coverage and message number, the random sampling is no longer an accurate model.
Exploring the message diffusion process of cooperative communication systems like ad hoc, this paper analyzes the causes of the inaccuracy of random sampling model and specifically introduces factors like node degree and visit times to solve the problem.It validates the effectiveness of the model proposed herein by comparing its results with the simulation experiment results.Our results are just from random graph network model, while, in the practical application,

3 . 3 .
: node degree;    : probability that the node  is visited for the  time;    : probability that there is at least one unvisited link when the node  is visited for the  time;    : the average number of visited links when the node  is visited for the  time;    (): probability that there are  (0 ≤  ≤   ) visited links when the node  is visited for the  time;    ( | ): probability that the number of visited links changes from  to  (0 ≤  ≤  ≤   and 0 ≤  −  ≤ 2) visited links when the node  is visited for the  time.As a result, the computing equation of () is  () = Effects of Visit Times.From (13), we may find it is necessary to define 2 parameters to calculate the value of () :    and    .According to the definitions, both of them are related to the visit times of nodes.The calculation processes of these two parameters are as follows.
), and    ( | ) are shown in Section 3.3 which will not be mentioned here.

Figure 2 :
Figure 2: Relationship between node visit times  and node coverage /.

( 2 )
Node Visit Times-.As shown in (35), this modified model proposes 2 key parameters    and    , which are related to , the node visit times.So it is necessary to figure out the visit times of node in the process of ad hoc network message diffusion.
Figures 4(a)-4(c) show the message diffusion simulation experiment results in the conditions of  = 8,  = 1000, 2000, and 3000, which are compared with the theoretical analysis results of coupon collector's problem and the model ( = 1,  = 3, and  = 5) proposed herein.