Diffusing information for mobile social networks under consideration of dynamic influence

As the developments of new techniques, mobile social networks have been built wildly. To obtain and spread information over mobile social networks efficiently, the influence maximization problem is to find a seed nodes set with limited size such that it can influence as many nodes as possible. Previous works ignore the dynamic influence phenomenon of diffusing information on mobile social networks. In this article, we propose a new model to express the procedure of diffusing information under the existence of dynamic influence. Theoretical analysis shows that the influence maximization problem under new model is non-deterministic polynomial-time hard, and efficient approximation algorithm is proposed. Experimental studies on real data sets show that the new model can process dynamic influence well in the diffusing information procedure, and the proposed algorithms can solve the influence maximization problem on new model efficiently.


Introduction
Recently, as the developments of techniques of communications and computing, modern smart phones have huge increase and rapid popularization in the whole world. According to the reports, until 2015, the size of mobile phone users in the world has reached 4.45 billion and 42.9% of them are using smart phones (about 2 billion). By the reports from China, the size of mobile smart phone users has reached 5 billion in 2014. In addition, as the emergence of more devices, such as tablet and smart watch, more and more smart mobile devices are popularized now. Because of the developments of embedded computing, sensing, and communicating techniques, the smart mobile devices have more and more powerful abilities and they are not only tools for communications any more. In many applications, they have performed to be powerful tools for mobile sensing, computing, and so on. For example, iPhone 6 has integrated at least eight types of sensors such as ALS (ambient light sensor), PS (proximity sensor), and GPS (Global Positioning System), and it takes a 2.6 GHz 64-bit processor, besides common modules for telecommunication, and it also has powerful WiFi and bluetooth devices.
As the size and power of smart mobile devices increase, more and more online social applications are being used in mobile environments, and the traditional social networks have evolved to be mobile social networks. The typical applications of mobile social networks include Facebook, Twitter, Google+, and Sina Weibo. According to the reports, until 2014, the number of month average users (MAU for short) of Facebook is about 1.3 billion, and in 2015, the MAU size for Sina Weibo has reached 212 million only in China. As more and more mobile applications are utilized, more and more natural mobile social networks are being created.
The emergence of mobile social networks changes the way of information diffusion and provides opportunities for viral marketing as shown in the work by Chen et al. 1 Different from traditional methods for marketing, viral marketing can utilize the ''word-of-mouth'' advantages of mobile social networks and diffuse advertising information more efficiently. It has attracted lots of research interests from both mobile and social computing areas. Influence maximization problem is one of the most popular topics in the area of mobile social network. It has been formally investigated by Kempe et al. 2 and obtained lots of attentions from many researchers. [3][4][5] However, there are still important challenges not solved in real applications of influence maximization problem when facing more complex scenarios. One of them is that the influence ability between nodes in real world may usually dynamically change, which is also the motivation of this article, while most of current research efforts on influence maximization problem always assume that the influence ability is static.
For general social networks, they are at least composed of nodes and edges like general networks. Usually, each node represents one social actor (e.g. one person in physical world and one user of Facebook) and the edges represent the social interactions between social actors (e.g. two persons are friends and one user is following the other one on Facebook). When considering the influence maximization problem on mobile social networks, usually, the edges have special meanings related to influence (e.g. one user accepts the advice of another one). The most popular methods proposed by previous works are as follows. First, a model is built to describe the information diffusion procedure on mobile social networks; then, the algorithms for finding special seed set with maximized influence are designed. In classic models, whether user A will be influenced by user B depends on the influence probability between them. In detail, if B has accepted some advice and the influence probability between B and A is 0.9, A will accept the same advice with probability 0.9 in next step. The influence probability is a simple and clear method to describe the ability of influence between two nodes, and it is usually assumed to be a constant once if a special social network is considered. However, in mobile social networks, the influence probability may be dynamically changed.
Let us consider an example in real life to show the dynamic influence phenomenon during the process of diffusing information which has been also observed by only few previous works. [6][7][8] Assume that there is one user A of Twitter, which can be treated to be a directed graph, before lunch A is browsing his following twitters. Using GPS or WiFi devices of A's smart phone, the app of Twitter may know the location of A and push one advertisement about restaurant X nearby. After seeing that, A may have his lunch in X and post twitters to recommend X to his friends. Assume A has two friends B and C, B is also near from X but C is not. Then, B is more likely than C to view the advertisement and go for a taste, while A may have the same influence probabilities for both B and C on general information. Let us imagine and compare two similar scenarios. In the first case, A receives one posted tweet from his friend B; here, B is the original author of that tweet. In the second case, A still receives the tweet from B, and the difference is that B reposted that from C. Obviously, the influence probability from B to A in the first case should be higher than the one in the second case. In the two examples above, the influence probability is not static for two given nodes, and it can change during the procedure of information diffusion. As known by us, there are only few previous works considering such challenges and none of them considers the influence maximization problem under dynamic influence directly.
Actually, the examples above show two important sources of dynamic influence, locations and time delay. In a mobile social network, if two nodes have same or similar locations, the influence probability between them tends to become larger. If one information has long time delay or has been reposted several times by users sequentially, after receiving such information, the influence probability between two users tends to become smaller. Here, the location need not be a geographic position, and it can be any profile information of nodes which can enhance the social interactions between nodes. For example, when the nodes represent academic authors, the affiliation or research area information can be the ''location.'' The time need not be a real time either, and it can be any value about the information which can affect the interests for the information of nodes. For example, we can use the times that one user has received the same information to measure the ''time.'' Zhou et al. 6 considered only the ''location'' factor but not the ''time'' factor. Only few works focus on dynamic influence caused by those factors. Goyal et al. 7 and Leskovec et al. 8 have observed the dynamic influence cases caused by ''time'' factor. Leskovec et al. 8 summarized a series of interesting scenarios including the dynamic influence in social networks by experiments on real data sets. Goyal et al. 7 studied how to learn such dynamic influence efficiently. Neither of them directly focuses on the influence maximization problems under dynamic influence. Moreover, when considering dynamic influence, the algorithms designed for classical influence maximization problem are not able to be applied. This article studies the influence maximization problem under dynamic influence caused by the location and time factors.
In this article, we address the problem of diffusing information under dynamic influence in mobile social networks. Following the methods of previous works, we also use influence maximization problem to describe the procedure of diffusing information over mobile social networks. To overcome the challenge of dynamic influence, we modify the classic models to support describing the change in influence under considerations of location and time factors. To solve the influence maximization problem efficiently, we study its computational complexities and design efficient algorithms. The main contributions can be summarized as follows: The rest parts of the article are organized as follows. In section ''Related work,'' the related works are discussed. Then, some preliminaries and new definitions will be introduced in section ''Notations and definitions.'' In section ''Approximation algorithm for IMD problem,'' theoretical analysis and approximation algorithms for influence maximization problems are introduced. Also, an improved algorithm is given in section ''Improving Algorithm GREEDYIMD.'' Experimental results are shown in section ''Experimental evaluation.'' Finally, it is the conclusion.

Related work
The influence maximization is an important problem in the research area of online social networking, which has many applications such as viral marketing and computational advertising. It is first studied by Domingos and Richardson,9,10 and the formalized definitions and comprehensive theoretical analysis are given by Kempe et al. 2 The standard formal definition of influence maximization can be explained as follows: given the constraint that at most k nodes can be selected, the input is a graph which represents the ''influence'' relationships between nodes, the problem is to compute a set of k nodes such that the number of nodes influenced by the k nodes is maximum. Different models have been formally defined to simulate the information propagation processes with different characteristics, and the two most popular models are the independent cascade (IC for short) and linear threshold (LT for short) models. In the work by Kempe et al., 2 the influence maximization problems under both IC and LT models are shown to be NP-hard (non-deterministic polynomial-time hard) problems, and the problem of computing the exact influence of given nodes set is shown to be Y P-hard problem in the work by Chen et al. 1 Many research efforts have been made for the problem of finding the node set with maximum influence. Kempe proposed an algorithm for influence maximization based on greedy ideas which has constant approximation ratio (1 2 1/e). The time complexity of the greedy approximation algorithm of influence maximization is O(n 2 (m + n)), which is too high to be applied in large-scale social networks. To overcome the shortcomings of greedy-based algorithms, many researchers focus on the problem of influence maximization. By studying the submodularity characteristics of influence functions, Leskovec et al. 11 proposed CELF (cost-effective lazy-forward) algorithm. CELF can improve the performance of greedy-based algorithms for influence maximization by reducing the times of evaluations of influence set of given seed set; however, its performance on large-scale data is still not satisfying. Using the similar ideas, CELF++ is proposed to improve the performance of algorithms for solving influence maximization by Goyal et al. 12 In the work by Chen et al., 3 degreediscount algorithm is proposed to improve the performance of greedy-based influence maximization algorithms. By assuming all influence probabilities are same in IC models, Chen et al. 3 reduce the complexities of influence maximization problems and give better algorithms based on the new models. Utilizing the structural properties of communities in social networks, Chen et al. 13 proposed new algorithms by merging similar nodes and reduce the cost of computing influence set. Goyal et al. 14 proposed SIMPATH algorithm in LT model which improves the performance of greedy-based influence maximization algorithm in LT model. Jiang et al. 15 proposed simulated annealing-based influence maximization algorithms.
Kimura and Saito 16 proposed new models of information propagation based on the idea of finding shortest paths, which assume that the information is mainly transferred through shortest paths, and designed new heuristic algorithms for influence maximization problems. Using this model, Chen et al. 1 proposed heuristic algorithms based on maximum broadcast paths, which assume that the information propagated on the network is not transferred by shortest path but maximum broadcast paths. Based on the influence probabilities between users, for each single node, an influence tree is built by computing the maximum broadcast paths, which can be used to estimate the influence range of each user. By assigning threshold for each user, the influence tree can be controlled to ignore nodes which contribute little for the computation of influence set and reduce the size of nodes computed by the influence computation. Also, Chen et al. 1 proved the submodularity of influence functions defined based on maximum broadcast paths and designed approximation algorithms with 1 2 1/e approximation ratio. In the work by Han et al., 17 timeliness networks with opportunistic selection are investigated and the information maximization model is extended to those applications. In the work by Shi et al., 18 maximal time bound is considered to limit the abilities of diffusing information in social networks, and efficient algorithms for influence maximization problem for computing maximal timebounded positive influence set are proposed. In the work by Chen et al., 13 similarities of nodes of communities in social networks are utilized to reduce the number of nodes involved in the influence computation. Kim et al. 19 proposed efficient influence maximization algorithms in parallel computing settings. Cai et al. 20 try to extend the information maximization models to the applications of crowd-sourced data-based social networks. Han et al. 21 consider the communities in social networks and study the influence maximization problem over such networks.
There are also many works which try to extend the classic influence maximization methods to other application settings. Li et al. 22 study the problem of influence maximization under location-based social networks. In those networks, one node can be influenced by the other node if and only if they are neighbors according to their location information, and Li et al. 22 focus on the problem of finding k users which can influence maximum users in the location-based social network. Tang et al. 23 identify the relation types during propagating the information and formally define the problem of influence maximization by considering different types of relationships between nodes. A key idea is that given certain information which needs to be propagated, the influence set of some node set can be computed more efficiently by reducing those edges belonging to some certain types. Chen et al. 24 study the problem of influence maximization under topic-aware applications. Cai et al. 25 use the idea of information diffusion to prevent sensitive information in social networks. More related work on applications in social networks can be found in the works of Han et al. 26 and Bi et al. 27 Although the problem of extending classic influence maximization methods has been studied by many research works as shown above, we are not aware of any efforts on influence maximization on dynamic setting.
Obviously, the influence models are usually defined based on several parameters which are utilized to describe the key properties in real applications. The parameter selection problem is essential to the influence maximization methods. Tang et al. 5 propose topic factor graph (TFG) models to determine different parameters between users and topics. Liu et al. 4 determine different influence parameters among users using probabilistic models to analyze the relationships of distributions between topics of users and influence relations of users. Weng et al. 28 utilize the latent Dirichlet allocation (LDA) models to describe the topic distributions of user topics and propose TwitterRank methods to determine the influence probabilities between users and topics.

Notations and definitions
In this section, classical information diffusing models are introduced first; then, to integrate the two challenge aspects of diffusing information on mobile social networks, new model is proposed by modifying the classical one. Finally, based on the new ID model, we give the new definition of influence maximization problem. In fact, the problem of influence maximization depends on the definition of diffusing models. For all diffusing models, the related influence maximization problems are based on the same idea, but they are different on the aspect of computational hardness.

Traditional ID models
In this article, information diffusion can be described as the propagating procedure of information over some network. A network is usually denoted by a graph G(V, E). Here, V is the node set where each node represents one person or entity and E is the edge set where each edge represents the relation (cooperation, friends, enemies, and so on) between two nodes. Each node is associated with active or inactive state. Intuitively, the active state means that the node has been affected. The active set of nodes may affect the nodes in inactive set, and the influence ratio can describe the strength of that affection. If some inactive node is affected by some active node so much that the inactive becomes active, such a process is called activation. Intuitively, for some node v, the more of the neighbors of v are activated, more likely v will be activated. After that, v will affect more nodes further. As such procedures repeat, more and more nodes will become active. The procedure of activation cannot be reversed: one node can transform from inactive state to active state, but not vice versa. To design proper theoretical model to describe information diffusion in real world, the key is to explain how the interactions between nodes work. Next, we introduce two popular ID models.
During the procedure of information diffusion, another threshold value u v with respect to each node v is used to control the diffusion of information. In detail, at some instant time, let A(v) be the set of v's neighbor nodes which has been active. If P u2AðvÞ b u;v ! u v , v will become active. In this model, when node u tries to activate its neighbor v and fails, the influence b u,v is remembered and will be accumulated in the following activating steps. In other words, the influence from u to v will not be ignored, even if the activation is failed. As we will see in the following part, the influence is treated differently in other models. The whole procedure of information diffusion in LT model can be described as follows. First, an initial active node set S 0 will be activated. Then, in the ith step of information diffusion, based on the active nodes in S i 2 1 , the influence for each node on V \ S i 2 1 will be computed. According to the influence computed and the u v for each node v, all nodes satisfying P u2AðvÞ b u;v ! u v will be put in S i . Repeat these steps until no more nodes can become active.
IC model. IC model is a probabilistic model. Instead of b uv in LT model, this model uses p uv to describe the probability that u can activate v in a single activation. The whole procedure of information diffusion under IC model can be described as follows. First, an initial node set S 0 will be set to be active. Then, in the ith step, every node will try to activate their neighbors. In detail, for each node u2S i 2 1 and node v2V\S i 2 1 , if (u,v) 2E, v will be activated once with probability p uv . If v indeed becomes active, it will be added to S i and not be further considered in current step. Repeat this procedure until that no new nodes are added. It should be noted that p uv is only determined by u and v and is independent with other node pairs. In this model, each edge (u, v) will be considered only one time. Once it fails, this edge will never be considered. In the work by Kempe et al., 2 an extended model in which p uv will be decreased as time goes by is considered.

New model for diffusing information
All previous models for information diffusion do not consider dynamic influence; this part will propose a new model integrating both ''location'' and ''time'' factors which are major sources of dynamic influence in mobile social network.
ID model for dynamic influence. In ID model, the mobile social network can be represented by a graph G = (V, E). Here, V is the node set and E is the directed edge set which represents the influence relationship between nodes in the network. Intuitively, if there is an edge (u, v) 2 E, it says that v can be influenced by u. That is, if u has been influenced, v also may be influenced through the edge (u, v). We use a function P : E7 !½0; 1 to represent the influence probability of each edge which can be used to describe how much influence one node has on another one. For edge (u, v), we will represent its value of P with p uv . The parameters introduced above are same as the IC model, and to describe the dynamic influence during diffusing information, we need to involve more parameters.
To integrate the dynamic influence caused by ''location'' factors, we use the function L : V 3 V 7 !½0; 1 defined over V 3 V to describe the location relationship between nodes. Intuitively, the more likely that two nodes have the same locations, the more likely that they influence each other and the higher the corresponding value of L is. In the following, we use l uv to denote the L value of node pair (u, v).
To integrate the dynamic influence caused by ''time'' factors, we use the function C : E7 !½0; 1 to describe the change of influence caused by ''time delay.'' In the following, we will use c uv to denote the C value on edge (u, v).
Therefore, in ID model, to describe the information diffusion on some mobile social network, we need one four-tuple hG, P, L, Ci .
In ID model, given the network hG = (V, E), P, L, Ci and seed node set A and a threshold 0 u 1, the information diffusion process working in discrete time can be explained as follows. Here, we use t 0 , t 1 , . to represent the discrete times. Initially, at time t 0 , all nodes in A will become active and inserted into the set Z, and all other nodes will be initialized to be inactive. At time t i , all active nodes will try to activate their ''new'' neighbor nodes which are met first time at time t i . In detail, suppose node u is active and v is an inactive new met node of u, that is, u and v did not meet before t i . If l uv ! u, the node u will try to activate v in two steps. First, generate a random value x 1 between 0 and 1. If x 1 l uv , the node v will be activated with probability c iÀ1 uv . Otherwise, v will become active with probability p uv Á c iÀ1 uv . If l uv . u, the node v will become active with probability p uv Á c iÀ1 uv . It should be noted that each edge (u, v) will be utilized only once, that is, u has only once chance to activate v. Such procedure iterates until no new nodes can be added into Z. Finally, Z will be the influenced set of A under network G. During the whole procedure, it should be noted that the node state can transform from inactive to active, but not vice versa. Moreover, each node can be activated several times but can be activated by each node at most once.
The idea of integrating dynamic influence into information diffusion procedure can be explained as follows. The function L measures the location similarity between two nodes. If they are similar, v will become active with probability higher than the original p uv . It can be used to describe the case that one node tends to be influenced by the nodes ''nearby.'' The function C measures the decrease in influence caused by time delay. When considering the ''time'' factor, node v will try to be activated with probability lower than p uv . In real applications, using this model needs choosing proper values for the parameters used which can be solved by sophisticated learning methods. The problem of choosing values for the parameters has beyond the scope of this article. Furthermore, we have the following observations. Observation 1. Without function C, node v will be activated by u with probability l uv + (1 2 l uv )p uv which is larger than p uv . Observation 2. Without function L, node v will be activated by u with probability p uv Á c iÀ1 uv which is smaller than p uv .
Let us consider a real example of ID model shown in Figure 1. Given {A} as the seed node set and the threshold u = 0.6, an example of information diffusion procedure is shown in Figure 2. In the first step t 0 , node A will be initialized to be active and no edges will be processed in this step. In the second step t 1 , two edges connected with A will be processed. For (A, C), since l AC = L(A, C) = 0.7 . u = 0.6, node C will be tried to be activated in two steps. First, assume that the random value generated is 0.85. Because 0.85 . l AC = 0.7, C will be activated with probability p AC Á c iÀ1 AC = p AC Á c 0 AC = p AC = 0:6. For the edge (A, B), because l AB . u, B will be activated with probability p AB Á c iÀ1 AB = p AB = 0:9. It should be noted that since the diffusing procedure is a probabilistic process, the nodes with high probabilities may still be inactive and the nodes with low probabilities may become active. Let us assume that node C becomes active but node B does not as shown in Figure 2. The following steps shown in Figure 2 can be summarized as follows:

Influence maximization problem on ID model
Based on the observations of ID model above, it is easy to find that we can partition the result set Z to several disjoint subsets {Z 0 , Z 1 , .} according to the time when nodes in Z become active. Also, if we consider the procedure of information diffusion on ID model as a breadth-first traversal on directed graph G, {Z i } is obtained by partition Z using the depth of each node.
The object of general influence maximization problem is to maximize the node set influenced by the seed node set. Obviously, in ID model, information diffusion is a probabilistic process, in which node can become active during the procedure is uncertain. Therefore, we need to understand the procedure based on possible world semantics.
Let O be the set of all different possible worlds of given ID model. In fact, each possible world X 2 O can be determined uniquely by giving assignments to all probabilistic variables in the information diffusion. For special G = (V, E), if we do not consider the seed node set, the number of possible worlds is 2 jEj . For more, for given seed set A, let G A be the induced graph of G on node set A, E A be the set of edges in G A , and the number of all possible worlds should be 2 jEjÀjE A j . In the following, we will introduce how to compute the probability of the diffusion process and define the influence maximization problem based on possible world semantics. h It should be noted that in ID model, two different processes may reach the same possible world. Therefore, the probabilities of each single process and possible world are different. Formally, given an information diffusion process M, let S be the set of edges processed during M, then the probability Pr(G M ) can be computed by P e2G M \S PrðeÞ Á P e2SnG M ð1 À PrðeÞÞ. Since the edges considered by our information diffusing model are independent from each other, the main idea of Pr(G M ) is to compute the probability of the whole graph by combining the probabilities of all edges processed during information diffusion. In the following Obviously, not all edges in G should be considered since there are some edges not visited during the information diffusion procedure because of the topology structures.
h Usually, we use the function s(Á) to represent the influence range of given seed node set. That is, given seed set A, s(A) will be the nodes which become active after diffusing the information based on A. Observing the above procedure of information diffusion, for each single process, we have s(A) = Z. However, the diffusion process is a probabilistic one, we need a definition based on possible world semantics. Definition 1. Influence function. Given an ID model hG, P, C, Li, seed node set A, and threshold u, let {G 1 ,.,G m } be the set of possible worlds. The influence function d can measure the expected value of influence of A on G. For special A, d(G,A,u) is defined to be P PrðG i Þ Á jV G i j, and it is also denoted by d(A) for simplicity.
It can be found that for each possible world G i , its node size is just the node set which can be influenced by A.

Definition 2.
Influence maximization on ID model. Given an ID model hG, P, C, Li, threshold u, and an integer k . 0, the question is to find a subset A satisfying jAj = k and the size d(A) is maximized.
In the following parts, we will use IMD (influence maximization under dynamic influence) to represent the influence maximization problem on ID model.

Approximation algorithm for IMD problem
In this section, first, the computational complexity of IMD problem is studied which indicates that it is intractable and should be solved by approximation or randomized ways. To design approximation algorithms for IMD problem, the ID model proposed above is simplified to reduce possible worlds of the model and integrate dynamic influence. Finally, an approximation algorithm is proposed and formal analysis shows that the approximation ratio can be efficiently bounded.

Hardness of solving IMD problem
Since the influence maximization problem on classic models is usually NP-hard and approximation and heuristic algorithms are often needed, we first consider the computational complexities of IMD problem.

Theorem 1. IMD problem is NP-hard.
Proof. The theorem can be proved by observing that classical influence maximization problem on the IC model in the work by Kempe et al. 2 is a special case of IMD problem. The details can be analyzed as follows.
For IMD problem, we can prove that they are NPhard by making a direct reduction from the classical influence maximization problem on IC model in the work by Kempe et al. 2 Given a classical influence maximization instance I, by setting the parameter c uv = 1, l uv = 0 for every (u, v) and u = 0, it is easy to obtain the corresponding instance I# of IMD problem. Furthermore, it can be easily verified that there are bijective maps between the solutions of I and I#. Therefore, IMD problem is NP-hard.

Simplifying the ID model
By Example 3, it is easy to find that the computation of possible world probability is tricky and it is hard to process in solving IMD problem. In this part, we propose a method to simplify ID model by integrating the L function with ID model.
Given an instance I = hG, P, L, Ci of ID model, we can build another instance I# = hG, P#, L#, Ci as follows: Let the threshold be u. After such transformation, the new instance will only contain zero-value L function, in fact, we can delete the L function in I#. Lemma 1. For each possible world G 1 which is produced by information diffusion process over I or I#, we have Pr I ðG 1 Þ = Pr I 0 ðG 1 Þ.
Proof. Considering one possible world G 1 , we will show that the probabilities of G 1 are same. Because G 1 is a deterministic graph, by making a breadth-first traversal, it is easy to determine the edges which will not be visited during the information diffusion. For the left edges, we consider the following cases. For special edge (u, v) in I, if l uv ! u, Prðu; vÞ = l uv Á c iÀ1 uv + ð1 À l uv ÞÁ p uv Á c iÀ1 uv . Otherwise, Prðu; vÞ = p uv Á c iÀ1 uv . The computation of Pr(u, v) is based on the definition of information diffusion procedure.
While in I#, since L# value is 0, for any edge (u, v), the value Prðu; vÞ = p 0 uv Á c iÀ1 uv . For any edge (u, v) which satisfies l uv . u in I, we have Prðu; vÞ = p uv Á c iÀ1 uv according the definition of transformation above. For edge (u, v) satisfying l uv ! u in I, we have Prðu; vÞ = p 0 uv Á c iÀ1 uv = l uv Á c iÀ1 uv + ð1 À l uv Þ Á p uv Á c iÀ1 uv . Obviously, all edges in the possible world have same probabilities to be chosen. Therefore, each possible world in I and I# has same probabilities. Finally, the distributions of possible worlds of I and I# are same.
As shown above, the ID instance after transformation has zero-value L function; therefore, we can ignore the L function in the following discussions and only use hG, P, Ci to represent the instance of ID model. For more, given any instance I = hG, P, L, Ci and threshold u, the transformation can be finished in polynomial time cost.

Efficient approximation algorithm
According to Theorem 1, it is almost impossible to solve IMD problem in polynomial time; therefore, in the following parts, our aim is to find efficient approximation algorithms with performance guarantee. As shown in the work by Kempe et al., 2 monotone and submodular properties allow us to develop greedy algorithms to achieve (1 2 1/e 2 e) approximation ratio. Here, given function d(Á): x) 2 d(S 2 ) for any S 1 4 S 2 . Therefore, in the following parts, we will try to utilize such strategies to design efficient approximation algorithms for the influence maximization problems in this article.
We proposed an algorithm based on greedy idea which can produce approximation algorithms with ratio 1 2 1/e as shown by Fisher et al. 29 The algorithm is shown in Figure 3. The Algorithm GREEDYIMD takes I = hG = (V, E), P, Ci and integer k . 0 as the input parameters. First, set S for storing the optimal node seeds and is initialized to be empty (lines 1-2). Then, by considering the node one by one, at each time, the algorithm only chooses a node v with maximized D v and insert it into S (lines [3][4][5][6][7][8][9][10][11]. D v is the influence gain obtained by adding v to S, that is, Here, the value of function d(Á) is computed by invoking the procedure GETINFLUENCE (line 7).
In the GETINFLUENCE procedure, the inputs are composed of seed node set S and the instance I of ID model, and the goal is to return d IMD (S). As shown in the work by Chen et al., 1 the problem of computing d(Á) under classic IC model has already been ]P À hard. Therefore, in the GETINFLUENCE procedure, we use the sampling method to estimate the value of influence for given seed node set. First, the variable for storing the final result influence is initialized to be zero (line 1). Then, the sampling method will be ran for n times (lines 1-17) (the value of n can be determine according to the results in the work by Kempe et al. 2 ) and the averaged value of all result influences will be returned (line 18). During each sampling iteration, all temporary variables used will be initialized first (lines 3-6). Then, the seed nodes will be inserted into a queue Q (line 7). Q is helpful to do a breadth-first traversal on G. For each node, the probability of becoming active is calculated by p uv Á c iÀ1 uv (line 12).
Finally, based on the observation that the main procedure of Algorithm 3 is to iterate among all nodes and the GETINFLUENCE procedure only enumerates every edge of G, it is easy to verify that Algorithm GETINFLUENCE can be finished in polynomial time. In GETINFLUENCE, the time cost of codes between lines 3 and 17 can be bounded by O(jEj 2 ); therefore, the time cost of GETINFLUENCE can be bounded by O(n Á jEj 2 ). Combining GETINFLUENCE with GREEDYIMD, the total time cost of GREEDYIMD can be bounded by O(k Á n Á jVjÁjEj 2 ).

Analysis of Algorithm GREEDYIMD
In this part, we will show that Algorithm GREEDYIMD has performance guarantee on the approximation ratio. The main idea is to show the influence function d IMD satisfies the properties of monotone and submodular.
First, we introduce another kind of view of the information diffusion on ID model. According to the results in the work by Kempe et al., 2 an equivalent view of information diffusion process on IC model is as follows: each edge (u, v) of G is identified to be live independently with probability p uv and blocked otherwise. Therefore, we can use different assignments of live and blocked states of the edges to represent different results of information diffusion on IC models. Moreover, Kempe et al. 2 have shown that they have same distributions. Here, we can also use the similar view of information diffusion process on ID model. For each edge (u, v) of G, (u, v) is identified to be live with probability p uv Á c iÀ1 uv and blocked otherwise. It should be noted that the probabilities are not independent any more but depend on the depth i which is affected by whether other edges have become live. This difference makes the formal analysis even harder, and we will show how to solve this problem in the following.
Second, we introduce a new representation for the influence function d IMD (Á). According to the definition of d IMD , we have where F(G, A) represents the size of influence node set of A on network G with fixed choices of P and C, and f(G, A, v) is 1 or 0 which represents whether node v is in the influence node set of A on G.
In equation (1), given network G, we use X G to represent the set of all different live-blocked assignments on E G and x to represent some special assignment of E G . Given x 2 X G , we use G x to represent the graph obtained from G by deleting blocked edges. It should be noted that, even for same x, the probabilities of the assignments of E G on different seed sets A are different. We use Pr v (x, G, A) to represent those probabilities. The function g(G, x, A, v) is the particular value of f(G, A, v) on special x. The value of g can be determined as follows. First, according to x, a subgraph G# of G can be obtained by only taking live edges into E G 0 . Then, given seed set A, if v can be influenced by A on G#, the value g will be 1 and 0 otherwise. Finally, equation (1) can be obtained directly from the definition of d IMD .
In the following, we will show the monotone and submodular property of d IMD .
Lemma 2. The function g in equation (1) is monotone.
Proof. We can prove the function g is monotone by analyzing the connectivity between node v and the seed set A. Obviously, we need to show, given . Since the value of function g can be only 1 or 0, we need to show that g(G, x, S 1 , v) = 1 and g(G, x, S 2 , v) = 0 cannot be satisfied at the same time. Assume that g(G, x, S 1 , v) = 1 and g(G, x, S 2 , v) = 0. Because g(G, x, S 1 , v) = 1, let G# be the subgraph of G obtained by only choosing live edges identified by x in E G , and there must be some node u 2 S 1 such that u and v are connected in G#. Since S 1 4 S 2 , we have u 2 S 2 . Then, v is connected by some node in S 2 , that is, g(G, x, S 2 , v) = 1 which is a conflict. Therefore, g(G, x, S 1 , v) g(G, x, S 2 , v). Naturally, it is hoped that Pr v (x, G, A) is monotone. If so, we can obtain the result that d IMD (Á) is monotone directly based on equation (1). Unfortunately, we have the following Lemma. Proof. It can be understood by the example in Figure 4. Assume that c uv = 0.5 for every edge (u, v) 2 G, S 1 = {v a }, and S 2 = {v a , v b }. For S 1 , the edges (v a , v b ) and (v a , v c ) will be visited in the first iteration, and the edge (v b , v c ) will be visited in the second iteration. Therefore, Pr v (x, G, S 1 ) = 1 Á 0.9 Á (1 2 0.1 Á 0.5) = 0.855. For S 2 , the edge (v a , v b ) will not be processed and the other two edges will be visited in the first iteration. Therefore, Pr v (x, G, S 2 ) = 0.9 Á (1 2 0.1) = 0.81. Finally, we have Pr v ðx; G; S 1 Þ.Pr v ðx; G; S 2 Þ ð 2Þ For another example, let us modify the original graph G into H as shown in Figure 4. Similarly, we have Pr v (x, H, S 1 ) = 0.9 Á 0.9 = 0.81 and Pr v (x, H, S 2 ) = 0.9. Therefore Finally, we can obtain the result that Pr v (x, G, Á) is not monotone.
Since Pr v (x, G, Á) is not monotone, it is hard to prove the theorem directly based on equation (1) and we need an alternative view of d IMD (Á).

Theorem 2. The influence function d IMD (Á) is monotone.
Proof. To prove that the influence function d IMD is monotone, let us consider the given network G = (V, E, P, C) and two fixed seed node sets S 1 and S 2 satisfying Here, we use Q v ðG; A; pÞ to represent the probability that node v can be influenced by A on the network obtained from G by selecting live edges with probabilities defined in the information diffusion procedure. The parameter p is a subset of P = {p uv j(u,v) 2 E}. Actually, in equation (1), the parameter p is implicitly contained in G. In equation (4), the aim of separating p from G is to show which variables in P are essential to the values of Q v . Therefore, for variables in P, we assume that the function Q v only involves p. For Q v , we have following observations: The set p only includes variables of {p mn j9w 2 A; ðm; nÞ appears on the path between w and v}. We will prove this by showing that the value change of p mn out of p does not affect the value of Q v . Comparing equation (1) with (4), it is easy to find that Q v is the sum of several Pr v s, each of which represents the probability of some assignment of X G . Assume that edge (s, t) does not appear on any path between v and nodes in A. Given some x 2 X G such that Pr v (x, G, A) is included in Q v , first, let us assume that (s, t) is labeled as ''blocked'' in x. Obviously, the expression Pr v (x#, G, A) for assignment x# obtained by changing the state of (s,t) to be ''live'' of x must also appear in Q v because adding the edge (s,t) to G x will not disconnect the paths between A and v and v will still be influenced by A in G x 0 . However, let the edge (s,t) be labeled as ''live'' in x and x# be obtained by changing the state of (s,t) to be ''blocked'' in x. Because (s,t) does not appear on any path between A and v, the deletion of (s,t) will not destroy the connectivity between v and A. That is, v will still be influenced by A in G x 0 and Pr v (x#,G,A) will appear in Q v . Since both Pr v (x,G,A) and Pr v (x#,G,A) appear in Q v and the only difference between x and x# is the state of edge (s,t), the sum of Pr v (x,G,A) and Pr v (x#,G,A) will eliminate the variable p st . Thus, p st will not appear in the expression of Q v . No matter what value p st is assigned to be, the value of Q v will not change. Finally, Q v can be denoted by an expression without any variables out of p. Q v is monotone with respect to each variable p mn in p. Obviously, for any assignment x 2 X G and its corresponding graph G x , if the value of p mn in p increases, the edge (m,n) is more likely to appear in the graph G x . If there is no paths through (m,n) connecting A and v, the change in p mn will not affect the probability that v is influenced in graph G x . Otherwise, if there is indeed one path passing (m,n) and connecting A and v, the probability that v will be influenced in G x will become larger. Therefore, the increase in p mn will not reduce the value of Q v and Q v is indeed monotone with respect to variables in p. Q v is monotone with respect to A. Given S 4 V, let y 2 V\S and S# = S S y. We will show that Q v ðG; S; pÞ Q v ðG; S 0 ; pÞ. We divide all edges involved in the procedure of information diffusion into several parts according to the iteration steps they need to be influenced. For example, observing the information diffusion procedure shown in Figure 2, all edges can be labeled with a number representing the steps they utilized. For example, in Figure 2, node A is labeled 0, node B is labeled 3, and node C is labeled 2. According to the definition of d IMD , the success probability of each edge (u,v) is computed based on the number labeled shown above. The smaller the number, the lesser the influence reduced and the more likely that the edge is successfully used. After inserting some node y into A, during the information diffusion procedure, the numbers labeled on some edges will decrease and the probabilities corresponding to those edges will increase. Since we have shown that Q v is monotone with respect to variables in p, it is easy to verify that the value of Q v will increase as the insertion of node y. Finally, we have Q v is monotone with respect to A.
Based on the above observations and equation (4), Proof. To prove that the influence function d IMD is submodular, let us consider a given network G = (V,E,P,C), two fixed seed node sets S 1 and S 2 satisfying S 1 4 S 2 , and one node u 2 V\S 2 . We need to show that Using the similar idea of the proof of Theorem 2, let us consider equation (4), we will explain the proof based on the function Q v and show the submodular properties for each Q v . For a fixed node v, to finish the proof, let us consider the relationship between u and v.
To be simplicity, we will use Q v (S) to represent Q v ðQ; S; pÞ: The first case is that u cannot be directed to v. That is, there are no paths from u to v in G.
Obviously, the addition of u will not add the paths between seeds set and node v, and u will not increase the Q v values defined over S 1 and S 2 . Another way of changing the value of Q v is that u affects the topology structures of S 1 and S 2 . Suppose there is one path p between some node w 2 S 1 and v and the addition of u changes the iteration levels of some edge (y,z) of p, the value of Q v will change because of the change in influence probability for (y,z). However, in that case, there would be also one path between u and v through y, that is, conflict with the assumption. Therefore, when u cannot reach v in G, we have . The second case is that there are paths between u and v. It should be noted that the topology structure constructed by the information diffusion procedure is in fact a tree and the edges between trees are eliminated by the mechanism that every node can be activated at most once. Therefore, we can divide all kinds of paths into three types: the path started from S 1 , the path started from S 2 , and the path started from u. We use P 1 , P 2 , and P u to denote them, respectively. The value of Q v changes because there are edges moving from one set to another set during the addition of u. Let us consider some special edge e = (y,z). We ignore the trivial cases that e does not move between sets since the value Q v will not change. We have the following two observations. (1) If e moves from P 1 to P u after inserting u to S 1 , it will also move from P 2 to P u because S 1 4 S 2 . (2) If e moves from P 1 to P u or from P 2 to P u , u can reach v through the edge e. Therefore, it can be found that for node v, we have By combining the above two results and equation Theorem 4. Algorithm GREEDYIMD can solve the IMD problem with (1 2 1/e) approximation ratio.
Proof. According to the result in the work by Kempe et al., 2 since we have Theorems 2 and 3, it is easy to verify that Algorithm GREEDYIMD can solve the IMD problem with (1 2 1/e) approximation ratio.

Improving Algorithm GREEDYIMD
In Algorithm GREEDYIMD, during simulating the procedure of information diffusion, there are still many redundant operations to be performed, and we can improve the algorithm by merging and eliminating those operations. The main idea of this part can be explained by the following example. Let us consider an extreme case, suppose in G, that is the original graph in ID model, there is a subset V# 4 V satisfying that V# forms a connected component and the nodes in V# have no edges with the nodes outside. It is easy to check that if the seed set A satisfies A 4 V#, we need not process the edges out of V# in the algorithm for solving influence maximization problem. Obviously, we can improve the performance of GREEDYIMD by removing such edges. In fact, the optimization idea comes from the observation that sparse subgraphs exist generally in real applications. For those cases, if some node is specified to be the beginning of the diffusion, some part of the whole graph will never be visited because of the sparse part in the graph. Therefore, for given node a, find and eliminate the part which will never be visited with a as the beginning node is useful for improving the efficiency of the algorithm.
Based on the idea above, propose improved version of GETINFLUENCEIMPROVED to solve the influence maximization under ID model. The improved algorithm is shown in Figure 5. First, given the input I and S, to simplify the computation and enlarge the chances to remove redundant edges, in GETINFLUENCEIMPROVED, the nodes are considered one by one and their expected influences are estimated (line 2). For each special node, a graph G# with reversed edges of G is built (line 3). Then, by making a traversal over G# from the node v, we can find which edges are related with v during the information diffusion procedures (lines [4][5]. It should be noted that if the node v has no relations with the given seed set S, we can find that and ignore v in this step. Also, by a traversal from v, not all nodes in S need to be reached. Therefore, the unrelated seed nodes are also filtered in this step. By extracting edges from the obtained nodes set, we can get a graph G D which is much smaller than G and the simulation operations will only be performed on G D (line 6). For each graph G D , the simulation process is run multiple times to get a precise estimation (lines . In each simulation, first the edges E 1 in G D are extracted and added to an queue structure (line [10][11][12][13][14]. Then, the following operations are similar to GETINFLUENCE. The average value of influence size obtained by multiple simulations will be added to the final results (lines [29][30]. Finally, the result will be returned (line 31). It can be found that the optimization is implemented by preprocessing the original graph and labeling nodes that are not useful. The time cost of GETINFLUENCEIMPROVED can be bounded by O(jVj Á n Á jEj 2 ). Compared with GETINFLUENCE, the worse-case time cost is increased because there are cases that no subgraphs can be eliminated by the optimization steps.

Experimental evaluation
Based on real data sets, we evaluate the performance of Algorithm GREEDYIMD and GETINFLUENCEIMPROVED and compare them with some current influence maximization algorithms. All codes are implemented in C++, and all experiments are run on a personal computer with Intel Quad CPU 2.33 GHz and 8 GB main memory. All experiments about running time are run five times and the average values are reported.

Experiment setup
We ran our experiments on four real data sets, whose summary information is shown in Table 1. The digital bibliography & library project (DBLP) data set is a large network of research collaboration maintained by Michael Ley. In the network of DBLP, the nodes represent the authors of academic papers and there exist one edge between two nodes if and only if the two corresponding authors have collaborations. For DBLP, we use the coauthor relationships to compute the influence probability between two authors. Twitter is the network composed of twitters and the tweets posted by them, which is the most popular micro-blogging system in the world. In this network, the nodes represent the twitters and the edges represents the ''following'' relations between them. For Twitter, we use the repost actions to compute the influence probability. Epinions is a network built by who-trust-whom relations. In this network, nodes represent the users and the edges between them represent the trust relation. We use the

Experimental results and analysis
We compare the algorithms proposed in this article on qualities of seed sets and running time costs based on the four real data sets. We use different parameters of L and C to run the experiments. Here, given a constant c, the value of function C on each edge is generated randomly with following Poisson distribution. Similarly, L values are generated randomly for each pair of nodes. In the following parts, we use MID-A to represent the GREEDYIMD algorithm running with L and C values generated by (l = 0.2, c = 0.2). Also, we use MID-B, MID-C, and MID-D to represent GREEDYIMD algorithm with parameters (l = 0.2, c = 0.8), (l = 0.8, c = 0.2), and (l = 0.8, c = 0.8), respectively. When comparing the running times, to be clear, we use IMD-A and IMD-D to represent the GREEDYIMD algorithm with GETINFLUENCE procedure, while we use IMD-Ax and IMD-Dx to represent the GREEDYIMD algorithm with GETINFLUENCEIMPROVED procedure.

Effects of seed sets
The effect of given seed set can be evaluated by the influenced nodes size. On four data sets, we compare four algorithms with different parameters on seed node sets with different sizes. The results are shown in Figure 6. It can be observed that as the size of seed nodes set increases, the size of influenced nodes increases almost with linear speed. The result is expected since in a large enough network, all nodes tend to perform uniformly during the information diffusion. Also, it can be found that when increasing the value of L and C functions, the size of influenced nodes set gets larger. Actually, when L becomes larger, essentially it increases the probability of influence, thus the size of influenced set will get larger. When C becomes larger, it reduces the decreasing speed of time delay parameters and the size of influenced set will also gets larger. This is because that the two parameters are used to control the influence abilities of the set of seed nodes, which will be much clearer in the following experimental results.

Effects of function L
The effect of function L is evaluated by the influenced nodes size. On four data sets, fixing the size of the seed nodes to be 20, we compare four algorithms with different L function values. The results are shown in Figure 7. It can be observed that as the value of L increases, the size of influenced nodes increases. When the L value increases, the increasing speed of influenced nodes set becomes slow. In fact, the effect of L values is to increase the original influence probability in a proper level; therefore, when the L value becomes much larger, the increasing speed of influence set size will get slow. Also, it can be found that for fixed L value, when we scale C from 0.2 to 0.8, the size of influenced nodes set gets larger.

Effects of function C
The effect of function C is evaluated by the influenced nodes size. On four data sets, fixing the size of the seed nodes to be 20, we compare four algorithms with different C function values. The results are shown in Figure 8. It can be observed that the results are similar to the results of L function. It should be noted that when the value of C is relatively small, the increasing trend of size of influenced nodes set is more sharp than the results in L function. It depends on the working mechanism of L and C. L is used in linear way in the diffusing information procedure, while C is used in exponential way.
Running time. For two data sets, we ran the original greedy algorithm and improved greedy algorithm proposed in this article for different sizes of seeds set. The running time results are shown in Figure 9. It can be found that as the size of seeds set increases, the running time cost also increases; when seed node size becomes larger, the increasing speed of running time cost becomes slow. Also, we can find that the improved algorithms are three times in average faster than the original algorithms. This is because of the reduction in computation cost and optimizing strategies.

Conclusion
In this article, based on the observations of information diffusion process on mobile social networks, the ID model for diffusing information under dynamic influence is proposed. By theoretical analysis, we determine the complexities of solving influence maximization on the new model and design efficient algorithms with approximation performance guarantee. By experiments over real data set, the performances of ID model and the algorithms proposed are verified. One possible further question is how to design more efficient algorithms for dynamic influence in social networks. Another question comes from the methods of modeling dynamic influence in this article. Obviously, our methods cannot cover all possibilities of dynamic influences, and we need to investigate more typical representations for dynamic influences and study how to design algorithms for the related influence maximization problem.