A Dynamic Users’ Interest Discovery Model with Distributed Inference Algorithm

One of the key issues for providing users user-customized or context-aware services is to automatically detect latent topics, users’ interests, and their changing patterns from large-scale social network information. Most of the current methods are devoted either to discovering static latent topics and users’ interests or to analyzing topic evolution only from intrafeatures of documents, namely, text content, without considering directly extrafeatures of documents such as authors. Moreover, they are applicable only to the case of single processor. To resolve these problems, we propose a dynamic users’ interest discovery model with distributed inference algorithm, named as Distributed Author-Topic over Time (D-AToT) model. The collapsed Gibbs sampling method following the main idea of MapReduce is also utilized for inferring model parameters. The proposed model can discover latent topics and users’ interests, and mine their changing patterns over time. Extensive experimental results on NIPS (Neural Information Processing Systems) dataset show that our D-AToT model is feasible and efficient.


Introduction
With a dynamic users' interest discovery model, one can answer a range of important questions about the content of information uploaded or shared to social network service (SNS), such as which topics each user prefers, which users are similar to each other in terms of their interests, which users are likely to have written documents similar to an observed document, and who are influential users at different stages of topic evolution, and it also helps characterize users as pioneers, mainstream, or laggards in different subject areas.
In fact, when people enjoy SNS with their smart devices including phones and tablets, each user's interest is usually not static.However, the above models are devoted to discovering static latent topics and user's interests.Moreover, they are applicable only to the case of single processor.Of course, one can perform some post hoc or pre hoc analysis [4,13] to discover changing patterns over time, but this misses the opportunity for time to improve topic discovery [14], and it is very difficult to align corresponding topics [15].Currently, attention for dynamic models is mainly focused on analyzing topic evolution only from text content, such as Dynamic . . . . . . . . .
The illustration for discovering dynamic users' interests.
Topic Model (DTM) [16], continuous time DTM (cDTM) [17], and Topic over Time (ToT) [14].This paper mainly focuses on the dynamic users' interest discovery model, especially collapsed Gibbs sampling following the main idea of MapReduce [18].Figure 1 gives a detailed illustration for discovering dynamic users' interests.Our previous work [19,20] is limited to inference algorithm on single-processor.
The organization of the rest of this work is as follows.In Section 2, we firstly discuss two related generative models, Author-Topic (AT) model and Topic over Time (ToT) model, and then introduce in detail our proposed Author-Topic over Time (AToT) model.Sections 3 and 4 describe the collapse Gibbs sampling methods used for inferring the model parameters and distributed inference algorithm version, respectively.In Section 5, extensive experimental evaluations are conducted, and Section 6 concludes this work.

Generative Models for Documents
Before presenting our Author-Topic over Time (AToT) model, we first describe two related generative models: AT model and ToT model.The notation is summarized in Table 1.

Author-Topic (AT) Model.
Rosen-Zvi et al. [3][4][5] propose an Author-Topic (AT) model for extracting information about authors and topics from large text collections.Rosen-Zvi et al. model documents as if they were generated by a two-stage stochastic process.An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words.The probability distribution over topics in a multiauthor paper is a mixture of the distributions associated with the authors.
The graphical model representations for AT model are shown in Figure 2. The AT model can be viewed as a generative process, which can be described as follows.
(1) For each topic  ∈ [1, ], (i) draw a multinomial   from Dirichlet(); (2) for each author  ∈ [1, ],  Time (ToT) Model.Unlike other dynamic topic models that rely on Markov assumptions or discretization of time, each topic in Topic over Time (ToT) model [14] is associated with a continuous distribution over timestamps, and, for each generated document, the mixture distribution over topics is influenced by both word cooccurrences and the document's timestamp.Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time.

Topic over
The graphical model representations for ToT model are shown in Figure 3.The ToT is a generative model of timestamps and the words in the timestamped documents.The generative process can be described as follows.(1) For each topic  ∈ [1, ], (i) draw a multinomial from Dirichlet(); (2) for each document  ∈ [  4. The AToT model can be viewed as a generative process, which can be described as follows.(1) For each topic  ∈ [1, ], (i) draw a multinomial   from Dirichlet(); (2) for each author  ∈ [1, ], (i) draw a multinomial   from Dirichlet(); (3) for each word  ∈ [1,   ] in document  ∈ [1, ], (i) draw an author assignment  , uniformly from the group of authors a  ; (ii) draw a topic assignment  , from Multinomial(  , ); (iii) draw a word  , from Multinomial(  , ); (vi) draw a timestamp  , from Beta(  , ).
From the above generative process, one can see that AToT model is parameterized as follows: As a matter of fact, a paper is usually written by the first author and reprint author.If one wants to differentiate the contributions of the first author and reprint author from those of other coauthors, it is very easy for AToT model to set different weights for different authors.But since there are no criteria to guide the corresponding weights, we just set the equal weights for all coauthors in this work; that is to say,  , |   follows the uniform distribution.

Inference Algorithm
For inference, the task is to estimate the sets of the following unknown parameters in the AToT model: , and Ψ = {  }  =1 and (2) the corresponding topic and author assignments  , ,  , for each word token  , .In fact, inference cannot be done exactly in this model.A variety of algorithms have been used to estimate the parameters of topics models, such as variational EM (expectation maximization) [21,22], expectation propagation [23,24], belief propagation [25], and Gibbs sampling [19,20,26,27].In this work, collapsed Gibbs sampling algorithm [26] is used, since it provides a simple method for obtaining parameter estimates under Dirichlet priors and allows combination of estimates from several local maxima of the posterior distribution.
In the Gibbs sampling procedure, we need to calculate the conditional distribution ( , ,  , | w, z ¬(,) , x ¬(,) , t, a, , , Ψ), where z ¬(,) , x ¬(,) represents the topic and author assignments for all tokens except  , , respectively.We begin with the joint distribution (w, z, x, t | a, , , Ψ) of a dataset, and, using the chain rule, we can get the conditional probability conveniently as where  (V)  is the number of times tokens of word V are assigned to topic  and  ()   represents the number of times author  is assigned to topic .Detailed derivation of Gibbs sampling for AToT is provided in the appendix.
If one further manipulates the above (1), one can turn it into separated update equations for the topic and author of each token, suitable for random or systematic scan updates: During parameter estimation, the algorithm keeps track of two large data structures: an  ×  count matrix  ()   and a  ×  count matrix  (V)  .From these data structures, one can easily estimate the Φ and Θ as follows: As for Ψ, similar to [14], for simplicity and speed, we update it after each Gibbs sample by the method of moments [28]: where   and  2  indicate the sample mean and biased sample variance of the timestamps belonging to topic , respectively.The readers are invited to consult [28] for details.In fact, similar to [14], since the Beta distribution with the support [0, 1] can behave many more shapes including the bell curve than Gaussian distribution, it is utilized to model the timestamps.But Wang and McCallum [14] did not provide much detail on how to handle documents with 0 and 1 timestamps so that they have some probability, so the time range of the data is normalized to [0.01, 0.99] in the paper.
With (2)-( 6), Gibbs sampling algorithm for AToT model is summarized in Algorithm 1.The procedure itself uses only seven larger data structures, the count variables  ()   and  (V)  , which have dimension × and ×, respectively, their row sums   and   with dimensions  and , Beta parameters Ψ with dimension  × 2, and the state variable  , ,  , with dimension  = ∑  =1   .

Distributed Inference Algorithm
Our distributed inference algorithm, named as D-AToT, is inspired by AD-LDA algorithm [29,30], following the main idea of the well-known distributed programming model, MapReduce [18].The overall distributed architecture for AToT model is shown in Figure 5.As stated in Figure 5 The Author-Topic count { ()   } and topic-word count { (V)  } are likewise distributed, denoted as { ()  | } and { (V) | } on mapper , which are used to temporarily store local Author-Topic and topic-word counts.
and updates local  () | and  (V) | according to the new topic and author assignments.After each iteration, each mapper sends the local counts to the reducer and then the reducer updates Ψ and broadcasts the global  ()   ,  (V)  , and Ψ to all mappers.After all sampling iterations, the reducer calculates the Φ and Θ according to (4)-(5).

Experimental Results and Discussions
NIPS proceeding dataset is utilized to evaluate the performance of our model, which consists of the full text of the 13 years of proceedings from 1987 to 1999 Neural Information Processing Systems (NIPS) Conferences.The dataset contains 1,740 research papers and 2,037 unique authors.The distribution of the number of papers over year is shown in Table 2.
In addition to downcasing and removing stop words and numbers, we also remove the words appearing less than five times in the corpus.After the preprocessing, the dataset contains 13,649 unique words and 2,301,375 word tokens in total.Each document's timestamp is determined by the year of the proceedings.In our experiments,  is fixed at 100 and the symmetric Dirichlet priors  and  are set at 0.5 and 0.1, respectively.Gibbs sampling is run for 2000 iterations.

Examples of Topic, Author Distributions, and Topic Evolution. Table 3 illustrates examples of 8 topics learned by
AToT model.The topics are extracted from a single sample at the 2000th iteration of the Gibbs sampler.Each topic is illustrated with (1) the top 10 words most likely to be generated conditioned on the topic, (b) the top 10 authors which have the highest probability conditioned on the topic, and (c) histograms and fitted beta PDFs which show topics evolution patterns over time.

Author Interest Evolution Analysis. In order to analyze further author interest evolution, it is interesting to calculate
In this subsection, we take Sejnowski T as an example, who published 43 papers in total from 1987 to 1999 in the NIPS conferences, as shown in Figure 6(a).The research interest evolution for Sejnowski T is reported in Figure 6(b), in which the area occupied by a square is proportional to the strength of his research interest.From Figure 6(b), one can see that Sejnowski T's research interest focused mainly on Topic 51 (Eye Recognition and Factor Analysis), Topic 37 (Neural Networks), and Topic 58 (Data Model and Learning Algorithm) but with different emphasis from 1987 to 1999.In the early phase (1989-1993), Sejnowski T's research interest is only limited to Topic 51 and then extended to Topic 37 in 1994 and Topic 58 in 1996 with great research interest strength and finally back to Topic 51 after 1997.Anyway, Sejnowski T did not change his main research direction, Topic 51, which is verified from his homepage again.[5], we further divide the NIPS papers into a training set D train of 1,557 papers and a test set D test of 183 papers of which 102 are singleauthored papers.Each author in D test must have authored at least one of the training papers.The perplexity, originally used in language modeling [31], is a standard measure for estimating the performance of a probabilistic model.The perplexity of a test document m ∈ D test is defined as the exponential of the negative normalized predictive likelihood under the model:  with

Predictive Power Analysis. Similar to
We approximate the integrals over Φ and Θ using the point estimates obtained in (4)-( 5) for each sample  ∈ {1, 2, . . ., 10} of assignments x, z and then average over samples.Figure 7 shows the results for the AToT model and AT model in a post hoc fashion on 102 single-authored papers.It is not difficult to see that the perplexity of AToT model is smaller than that of AT model when the number of topics > 10, which indicates that AToT model outperforms AT model.

Conclusions
With a dynamic users' interest discovery model, one can answer many important questions about the content of information uploaded or shared to SNS.Based on our previous work, Author-Topic over Time (AToT) model [19], for documents using authors and topics with timestamps, this paper proposes a dynamic users' interest discovery model with distributed inference algorithm following the main idea of MapReduce, named as Distributed AToT (D-AToT) model.The D-AToT model combines the merits of AT and ToT models.Specifically, it can automatically detect latent topics, users' interests, and their changing patterns from large-scale social network information.The results on NIPS dataset show the increase of salient topics and more reasonable users' interest changing patterns.One can generalize the approach in the work to construct alternative dynamic models from other static users' interest discovery models and ToT model with distributed inference algorithm.As a matter of fact, our work currently is limited to deal with the users and latent topics with timestamps in SNS.Though NIPS proceeding dataset is a benchmark data for academic social network, the D-AToT model ignores the links in SNS.In ongoing work, novel topic model, considering the links in SNS, will be constructed to identify the users with similar interests from social networks.

Appendix Gibbs Sampling Derivation for AToT
We begin with the joint distribution (w, z, x, t | a, , , Ψ).We can take advantage of conjugate priors to simplify the integrals.Consider

Figure 2 :Figure 3 :
Figure 2: The graphical model representation of the AT model.

Figure 4 :
Figure 4: The graphical model representation of the AToT model.

Figure 6 :
Figure 6: The distribution of number of publications and research interest evolution for Sejnowski T.
Multinomial distribution of words specific to the topic .  Beta distribution of timestamp specific to the topic . , Topic associated with the th token in the document   , th token in document   , Chosen author associated with the word token  ,  , Timestamp associated with the th token in the document , from Beta(  , ).

Table 3 :
An illustration of 8 topics from a 100-topic solution for the NIPS collection.The titles are our own interpretation of the topics.Each topic is shown with the 10 words and authors that have the highest probability conditioned on that topic.Histograms show how the topics are distributed over time; the fitted beta PDFs is shown also.