Gossiping with Message Splitting on Structured Networks

Gossiping of a single source with multiple messages (by splitting information into pieces) has been treated only for complete graphs, shown to considerably reduce the completion time, that is, the first time at which all network nodes are informed, compared with single-message gossiping. In this paper, gossiping of a single source with multiple messages is treated, for networks modeled as certain structured graphs, wherein upper bounds of the high-probability completion time are established through a novel “dependency graph” technique. The results shed useful insights into the behavior of multiple-message gossiping and can be useful for data dissemination in sensor networks, multihopping content distribution, and file downloading in peer-to-peer networks.


Introduction
Data dissemination is a fundamental issue for information diffusion in networks [1][2][3][4][5]. Gossiping (a.k.a. rumor spreading) is an effective way of data dissemination: imagine the situation where a rumor arises in town and people undertake gossiping-based actions to spread the rumor among the population [6]. There are two atomic types of gossiping protocols: "pull": an uninformed node selects a message it has not possessed and requests it from a randomly selected neighboring node, and "push": an informed node selects a message it prossesses and sends it to a randomly selected neighboring node.
Gossip-based algorithms are promising solution for many of the next generation networks, due to their simplicity, robustness, flexibility, and scalability. Existing applications are numerous, such as consensus and averaging in sensor networks [7,8], ad hoc message routing [9], peer-to-peer file distribution [10], and information diffusion in social networks [11,12]. Extended versions of the basic gossiping protocol include algebraic gossip [13] and geographic gossip [14], but they have additional overheads on message complexity and geographic knowledge, and in this paper our attention is focused on the basic gossiping protocol.
Most analytical works on gossiping have dealt with highprobability bounds of the completion time. Supposing a static and connected network, the completion time of a gossiping protocol is the first time at which all nodes are informed. Several important classes of network topologies have been analyzed for gossiping of a single source with a single message, including complete graphs [6], general graphs and hypercubes [15], random graphs [15,16], random geometric graphs [17], graphs with edge expansion [18], and graphs with vertex expansion [19]. Besides, gossiping of all sources each with a different single message has been considered in [10,20].
On the other hand, the gossiping type of a single source with multiple messages has been treated in [10], for complete graphs. In practice, the multiple messages may be obtained by splitting a file into multiple units. It has been shown in [10] that, compared with single-message gossiping, gossiping with multiple messages significantly reduces the completion time. Complete graphs lack geometry and enable direct message passing between any two nodes. In this paper, we are hence motivated to study gossiping with multiple messages on more structured graphs. Networks with various types of structure are more general and realistic than complete graphs, and the studies on structured graphs bring in new challenges and insights into the gossiping problem. The key question we seek to answer is how much benefit message splitting brings in when the message passing is restricted by the network topology. In this paper, we focus on a "sequential pull" protocol in which each node attempts to pull its desired messages sequentially following an indexing order. Our contributions are as follows.
(1) We develop a novel "dependency graph" analysis technique for gossip spreading and leverage it to establish a high-probability upper bound of the completion time for line graphs. In a nutshell, our result indicates that, in a line graph of nodes and with messages, the high-probability completion time scales at most like O(( + ) log( + )) as both and grow large, inferior to that for complete graphs [10], which is O( + log ), but still drastically superior to that without message splitting [17], which is O( ).
(2) We apply the result for line graphs to several other network topologies, including ring graphs, general graphs with given diameter and maximum degree, grid graphs, and random geometric graphs.
(3) We carry out numerical experiments to corroborate our analytical results.
The remaining part of this paper is organized as follows. In Section 2 we describe the sequential pull gossiping protocol. In Section 3 we establish the high-probability upper bound of the completion time on line graphs. In Section 4 we treat other classes of network topologies. In Section 5 we present numerical results. Finally Section 6 concludes this paper.

Pull-Based Gossiping Protocol
In general, a connected static network is represented as an undirected graph = ( , ) consisting of a set of nodes and a set of edges . Every node in wishes to obtain an entire copy of a file, which has been split into multiple units, each of which as a message for gossiping. A pair of nodes may communicate directly with each other if and only if the nodes are connected by an edge in . Time is slotted and a message can be transferred from a sender to a receiver within a time slot, which is called round throughout this paper. For each node, if a message is obtained in round , it may send that message to one of its neighbors since round + 1; when obtaining all messages, a node is said to be informed, otherwise uninformed.
Similar to that in [20], we describe the protocol of pullbased gossiping with multiple messages by a random walk following a Markov chain. Initially, a single source node in is endowed with messages, indexed as 1, 2, . . . , . In each of the following rounds, every uninformed node selects a message it has not possessed and contacts a neighbor V in its neighbor set ( ) with probability V or contacts no neighbor with probability = 1 − ∑ V∈ ( ) V . If V is contacted by and does possess the requested message, then obtains that message successfully. If the messages are requested sequentially in order by every node, that is, each uninformed node always pulls the message with the smallest index it has not possessed, the gossiping protocol is called "sequential pull" with multiple messages [10]. Note that "sequential pull" may be an appropriate restriction in certain applications (e.g., with streaming/real-time nature), in which messages need to be sequentially accumulated for content reconstruction.
In particular, for a line graph = ( , ), in which nodes indexed as 1, 2, . . . , are sequentially placed from one endpoint to the other, the gossiping protocol can also be described as follows. Without loss of generality, assume that the endpoint node 1 is the source node. In each round, each uninformed node pulls the message with the smallest index it has not possessed from its neighbor −1 with probability . Throughout this paper, we focus on the case where ≡ , 2 ≤ ≤ . Note that "gossiping with multiple messages on line graphs" can be leveraged to model the content distribution process of a multihopping data dissemination problem for sensor networks or for ad hoc networks; for example, see Section 5.
In the subsequent analysis, the main attention is paid to the completion time , which is the first time when all nodes in are informed. Our goal is to establish a high-probability upper bound of , that is, a quantity such that does not exceed it with probability at least 1 − − , for some constant > 0, for any sufficiently large .

Upper Bound of Completion Time on Line Graphs
In this section, we derive a high-probability upper bound of the completion time on line graphs, by a novel "dependency graph" technique. To the best of our knowledge, this technique has not been identified or used for analyzing gossiping in the literature. The result for line graphs, in company with complete graphs [10], reveals the potential benefit for message splitting to accelerate content distribution. Complete graphs and line graphs may be viewed as two extreme cases of graph topology, in that complete graphs have "no geometry" without message-passing constraints and line graphs have the "strongest geometry" with the most stringent messagepassing constraints. Importantly note that the message dissemination process through gossiping protocols itself does not depend on the network topology; however, our analysis of the completion time is focused on a specific class of graphs and thus has the knowledge on the network topology.

Gossiping with Single
Message. Before treating gossiping with multiple messages, we consider the single-message case, which provides a baseline for the multiple-message case to compare with.
Proof of Lemma 1. For 2 ≤ ≤ , let denote the time that node needs to successfully pull the message from node − 1 after node − 1 succeeds; then = ∑ =2 . Due to the memoryless nature of the gossiping process, { , 2 ≤ ≤ } are independent and identically distributed (i.i.d.) random variables, and all of them obey the geometric distribution with parameter .

Gossiping with Multiple
Messages. If a file with size of units is split into messages and the sequential pull-based gossiping protocol is implemented, then each node may send messages to its neighbors without waiting until obtaining the entire file. This is the key idea to leverage message splitting for accelerating content distribution. In the following, we establish an upper bound of the completion time for this case. On the other hand, a high-probability upper bound of the completion time for complete graphs has been obtained in [10], and it scales like O( +log ) with nodes and messages (in [10] the bound O( + log ) is in fact derived when both "pull" and "push" operations are implemented in the gossiping process). It is no wonder that gossiping on complete graphs is much more efficient than on line graphs since there is no communication constraint when all nodes can directly communicate with each other, and such a performance gap is clearly revealed in the difference between the scaling behaviors.
The fundamental reasons on the superiority of the multiple-message gossiping with message splitting over the single-message gossiping without message splitting can be explained as follows. When a file with size of units needs to be transmitted, each node can only send data to its neighbors after receiving the entire file in the single-message case. However, in the multiple-message case, the file can be split into messages and each node can send messages to its neighbors without waiting until obtaining the entire file. The parallelism that neighboring nodes can simultaneously obtain different parts of a file can help to accelerate the content distribution.
Before proving Theorem 2, we introduce the key concept of "dependency graph" for gossiping with multiple messages.
(1) Dependency Graph (See Figure 1). For = 2, 3, . . . , and = 1, 2, . . . , , let ( , ) denote a node-message pair which indicates the event that node is pulling message . We say that a node-message pair is realized if the node has obtained the message. We may thus construct a directed graph whose nodes are all the node-message pairs and call it the dependency graph since it describes the dependency among all the node-message pairs. Specifically, for ( , ), it has (at most) two incoming edges, from ( − 1, ) and ( , − 1), respectively, meaning that node can start to pull message immediately after both events ( − 1, ) and ( , − 1) are realized. Take Figure 1 for example: the event (2, 1) has to be realized in the first place; for each of the remaining node-message pairs, it can be realized only after its predecessor pairs, that is, the pairs directing to it, have already been realized.   For the dependency graph, we group the node-message pairs into multiple groups, so that two pairs ( , ) and ( , ) are in the same group if and only if + = + . For a line graph of nodes and messages, we thus have + −2 groups, indexed by the node-message-index-sum from 3 to + . Note that a group has at most = min{ − 1, } node-message pairs and the quantity = min{ − 1, } is called the degree of parallelism. Now consider a genie-aided gossiping schedule, which is based on the sequential pull-based gossiping protocol in Section 2, so that a genie enforces that only after all the nodemessage pairs in group have been realized, the gossiping processes for the pairs in group + 1 can start, for each = 3, . . . , + − 1. Apparently the resulting completion time in the genie-aided schedule is stochastically not smaller than the actual completion time. In the following, we will first upper-bound the time for the node-message pairs in a group to be realized, and with that, we will prove Theorem 2 subsequently.

(2) Completion Time for Each Group
Lemma 3. Consider the genie-aided gossiping schedule described above, and let denote the time for all the node-message pairs in a group to be realized. Then, for all > 0, Remark 3. We see that a high-probability upper bound of the completion time for each group is O[log( + )/ ] when both and are large enough. In the following, we will see that the total completion time for the multiple-message case is the summation over all these groups and the quantity ( + − 2) in the term − /( + − 2) of (5) is the union bound factor for the summation.
Proof of Lemma 3. Let 1 denote the first time when a nodemessage pair is realized; that is, a node successfully pulls its desired message. Since a node's attempt to pull from the predecessor node occurs with probability in the gossiping protocol, we have (the value of (log + log +log( + −2))/ might not be an integer, but it does not affect the magnitude of the asymptotic upper bound of 1 ) Now, taking the union bound over all the node-message pairs in one specific group completes the proof, since each International Journal of Distributed Sensor Networks 5 group has at most node-message pairs and the realizations of those pairs are mutually independent.
(3) Completion Time for Gossiping on Line Graphs. We are now ready to prove Theorem 2.
Proof of Theorem 2. First, consider the case where − 1 ≥ . In the dependency graph (see, e.g., Figure 1(a)), all nodemessage pairs are grouped as Similarly, for the case where − 1 < (see, e.g., Figure 1(b)), all node-message pairs are grouped as For both of these two cases, note that there are totally + − 2 groups and that the number of node-message pairs in each group is at most . Using Lemma 3, we know that, in each group, all the node-message pairs are realized within time [log + log +log( + −2)]/ with probability greater than 1 − − /( + − 2).
Since in the genie-aided schedule the gossiping processes are executed group by group sequentially, with + −2 groups, we have Eventually, taking the union bound leads to the fact that (9) holds with probability greater than 1 − − , and thus the proof of Theorem 2 is completed.

Completion Time on Several Structured Networks
In this section, we extend the result for line graphs to several other classes of network topologies. The analysis of line graphs is instrumental for establishing completion time bounds for other classes of graphs; see, for example, [15,17]. We define ( ) as the high-probability completion time of gossiping with multiple messages on network graph , if all nodes obtain all messages within ( ) rounds with probability at least 1− 0 ⋅ − for some constant > 0 and fixed constant 0 > 0, for any sufficiently large . For a connected network of nodes with maximum degree , assume that each node pulls messages from any of its neighbors with equal probability 1/ ( is the degree of ). In order to apply Theorem 2, consider the shortest path from the source node to another node , denoting the length of the shortest path by . Along this shortest path, in each round, each node pulls message from its predecessor node with probability at least 1/ . So, from Theorem 2, the completion time for node to obtain all the messages is upperbounded by O( ( + )(log( + )+ log )), with probability at least 1 − − . In the following, we apply the result to serval classes of graphs.
(1) Ring Graphs. A ring graph of nodes can be simply divided into two equal-length line graphs each of length /2, and its maximum degree is = 2. Therefore, we have the following corollary.

Corollary 4. For a ring graph of nodes, the sequential pullbased gossiping protocol with messages behaves like
for any > 0.
(2) General Graphs with Fixed Diameter and Maximum Degree. For a general connected graph with diameter and maximum degree , a high-probability upper bound of the completion time in the single-message case is O( ( + log )) [15]. Now, for any arbitrary node in the network, from our discussion above and Theorem 2, the completion time for that node to obtain all the messages is upper bounded by 6 International Journal of Distributed Sensor Networks O( ( + )(log( + ) + (1 + ) log )), with probability at least 1 − −1− . Taking the union bound over all the nodes in the network hence leads to the following corollary.
(3) Grid Graphs. For a grid graph of nodes on a √ ×√ lattice, by setting = 2√ and = 4 in (11), we have the following corollary.

Simulations
In this section, we carry out experiments to validate our analytical results against the sequential pull-based gossiping with multiple messages. We treat a multihopping data dissemination problem for sensor networks or for ad hoc networks, in which the content distribution can be exactly modeled by a multiple-message gossiping process on line graphs as well. Consider a multihopping network, where a source node wishes to spread a file with size of units to a destination node . The file is equally split into messages, indexed as 1, 2, . . . , . Suppose the routing path from to is known and consists of nodes. These nodes are sequentially indexed as 1, 2, . . . , from the source to the destination. Time is slotted and a message can be transferred from a sender to a receiver within a round. In each round, each uninformed node requests the message with the smallest index it has not possessed from its neighbor − 1. All the communications between neighbors are assumed to fail with probability 1 − , since there may be wireless error or the requested nodes may be busy. The protocol overhead is assumed to be encapsulated by physical-layer design, which is not considered herein. Recall the gossiping protocol described in Section 2, and then we see that this multihopping data dissemination can be exactly modeled by a multiple-message gossiping process on a line graph.
During our experiments, we let the multiple-message gossiping process be run for 1000000 times and record the completion time. The line graph = ( , ) consists of nodes, and one endpoint is endowed with messages. The successful probability of communications from senders to receivers is 0.5 for all experiment runs. The pseudocode of the simulation using the multiple-message gossiping protocol is presented in Algorithm 1.
The simulation results on gossiping's completion time are demonstrated in Figures 2 and 3, where the theoretical results are also presented and all the curves are plotted in the log-log scale. In Figure 2, is fixed to be 100 and ranges from 50 to 150; and in Figure 3, is fixed to be 100 and ranges from 50 to 150. In both figures, the curve of " / " presents the theoretical completion time of the naive gossiping without message splitting, the curve of "( + ) * log( + )/ " presents the theoretical completion time of the multiple-message gossiping predicted by Theorem 2, and the curve of "gossiping" presents the maximum value of the completion time recorded from our experiments. From for = 1 → 1000000 do Initialization: a node in is informed at round = 0. while not all nodes in are informed do = + 1. for each uninformed node do selects a neighbor V from its neighbor set ( ) with probability V ≡ . selects a message with the smallest index ≤ it has not possessed. attempts to pull the message from the neighbor V. if V does possess then becomes informed at the beginning of round + 1. the simulation results, we see that the benefit by message splitting is significant for accelerating data dissemination, and the upper bound of the multiple-message gossiping protocol established in Theorem 2 is validated as well. However, there is still gap between the analytical upper bound and the simulated completion time. It is an open issue to find a tighter upper bound of the completion time for gossiping with multiple messages on line graphs.

Conclusions
In this paper, we have investigated the problem of gossiping with multiple messages on structured networks so as to shed insight into the behavior of the multiple-message gossiping. We have developed the "dependency graph" analytical technique and further derived an upper bound of the highprobability completion time on line graphs. The potential benefit has been revealed for message splitting to accelerate content distribution through networks, and the result for line graphs has also been further extended to several other classes of network topologies.