Twitter Can Predict Your Next Place of Visit

The present work focuses on predicting users' next place of visit using their past tweets. We hypothesize that tweets of the person have predictive power on his location and therefore can be used to predict his next place of visit. This problem is important for location based advertising and recommender based services. To predict the next place of visit, we calculate the probabilities of visiting different types of places using bank of binary classifiers and Markov models. More specifically, we train bank of binary classifiers on past tweets and calculated the probabilities of visiting next places. Since bank of binary classifiers is based on a bag-of-words model, to account for time of last visited place and place itself, we built Markov models for different time duration to calculate probabilities of visiting next place. Empirical evaluation shows that by combining the probabilities obtained from bank of binary classifiers and Markov models the accuracy of predicting next place increased from 65% to 80%.


Introduction
How the life would be if we know the location of desired and intended place around us anywhere in the world.Though it seems to be superficial, the development in wireless and location acquisition technology helps us to build systems that solve this problem to some extent.Because of location acquisition technology, nowadays, it is not difficult to find out the places in given proximity.The bigger challenge is in predicting places for user according to his need and intent in given proximity.The existence of microblogging services like Twitter and Facebook provides a means to know the person's values and needs in better way.
People share their thoughts and happening on Twitter along with mundane information.In recent few years, the popularity of Twitter is growing exponentially.As of now, at the time of writing this paper the number of users on Twitter are 300 million producing 500 million (https://about.twitter.com/company)tweets per day.With mundane information, people also share their activities like what they are planning, visited places, experiences of visited places, with whom they are going, how they are feeling, and so forth.Availability of such valuable information motivates us to build a system that can predict the future places of visits for the user according to their intent and behavior.
In this paper, we hypothesize that future places of visit can be predicted considering two important factors that are (i) previous visited place and (ii) recent tweets.What people write on their timeline reflects their intent and need, which have a significant role in deciding the next activity to be performed.Similarly, previous visited place also has a major role in deciding the next place to be visited.For example, after having lunch people have coffee.Based on given hypothesis we propose a novel approach for predicting next places of visit.We use bank of binary classifiers (BBC) for predicting the probability of visiting next place using recent tweets only.And to account for time of last visited place and place itself, we build Markov models (MMs) for predicting probabilities.Both BBC and MMs compute the probability of visiting next place independently with high accuracy.We show that by combining these two probabilities we can further improve the prediction accuracy.In attaining the goal of this work, we also proposed two algorithms for tagging the tweets with location and for finding out the optimum number of past tweets used for prediction.Our approach consists of the following steps: (a) assigning location to tweets if relevant information is 2 International Journal of Distributed Sensor Networks presents; (b) extracting features from past tweets that are used to train the models; (c) building models using bank of binary classifiers; (d) using contextual information to enhance the accuracy of predicting the next place.Here contextual information is time of previous visited place and place itself.
We crawl more than 4600 Twitter timelines for exhaustive experimentation.Ground truths are extracted from these timelines using Google Places API and by analyzing tweets for mentions of visited places.We extract features from users timelines by using our proposed algorithm for finding out the optimum length of past tweets for prediction.We build models and do performance analysis.Our best model yields 80% of accuracy for top 5 predictions.We evaluate our approach on new users also and show that performances of models are similar to the seen users to model.The major contribution of this works is as follows: (1) Building generic models for predicting future places of visit using recent tweets only.(2) Building Markov models for predicting next place to be visited given previous place visited and time since it is visited.(3) Ensemble of both models to enhance the model accuracy in predicting next place.
The remainder of the paper is organized as follows.Section 2 presents the proposed approach in detail for predicting next place of visit.Next, Section 3 discusses the experiments and analysis of results.In Section 4 we review relevant literature work.Finally, Section 5 concludes the paper.

Proposed Approach
There are two points we want to mention before explaining the 4 steps of proposed approach.Firstly, we build the generic prediction model that captures the relationship between vocabulary used in past tweets and visited places, without considering user's demographics.The main advantage of this approach is that we do not need individual specific training data for predicting their next location.Secondly, our focus would be the category of the establishment like restaurant, supermarket, pub, and gym, rather than the specific establishment.By doing this, both establishment owners and users can get mutual benefits.For example, if proposed system is predicting restaurant in given spatial proximity, then all the owners of restaurants in the proximity of user's location can approach user with their available offers for promotions.Along with this, the user can also have the option to choose place according to his own suitable interest.
Four steps of our proposed approach are as follows.

Assigning Location to Tweets.
From here onwards we are using location and place of visit interchangeably.Assigning location to tweet is two-step process.First, filter all those tweets having geocoordinates and location information.For location information, we use a regular expression ("I'm at" or "@").Then by using Google Places API [1] (GPA), get all the places around geocoordinates of the given tweet.If place name present in a tweet is present among places returned by GPA, then that tweet is labeled by the name and categories of location.We want to mention that GPA also returns categories of places which may be more than one.For matching the place name, we extract three words after regular expression from a tweet and find out the ratio , with total words in each place name returned by GPA.If  > 0.75, we label the tweet with the first match among GPA places.

Extracted Features.
For predicting next place of visit, we use past tweets to infer the intent and interest of the user.We analyze from the ground truths recovered from timelines that people post activity related text before performing that activity.For example, user tweeted, "We are planning to go for some fun" before visiting the central park in New York.
The time interval of activity related post may vary from weeks to hours based on the activity to be performed.For example, user interested in cricket match going to happen next month has already started tweeting about the event whereas a user interested in going to a restaurant in the evening tweets just a few hours before visiting restaurant.Therefore, we built independent binary classifiers for each category using appropriate size of the window of past tweets.Here window stands for the time window of past tweets that is to be taken to form feature vectors.The window size is found empirically for each classifier/category independently and explained in next section.
To form feature vectors, first we label past tweets (concatenation of past tweets in given window) with the categories of location, visited by the user, just after posting these past tweets.Note that the location which user has visited can have more than one category, and, therefore, the labels of past tweets may be more than one.It is worth mentioning here that while concatenating past tweets, we are not considering location tweet (tweet that has location information).To build a binary classifier for category   corresponding data set   is formed.To form   , those past tweets which are labeled by   are treated as positive samples and remaining as negative samples.Hence for training 100 (total number of categories) binary classifiers, we have 100 corresponding data sets.For example, toy data set in Table 1 is having four different categories.Table 2 shows four binary data sets (  ) for four categories, respectively, as explained above.

Bank of Binary Classifiers (BBC) for Predicting Future
Location.For each category, we build independent binary classifier and the size of the window of past tweets for each classifier is determined empirically.The steps for determining the window size of past tweets for category   are given in Algorithm 1.
In Algorithm 1, for each category, we vary window size  (in hours) from 6 hours to 600 hours to construct feature vectors.Then, for each window size , we form feature vectors that are used to train the binary classifier.Finally, the window size of a particular category is set, based on the best performance of classifier among all window sizes.For constructing binary data set of each category, function getBinaryDataSet(,   , ) is called with parameters that are timelines of all users (), name of category (  ), and size of window of past tweets in hours ().For all location tweets available on all timelines in , we form feature vectors by concatenating past tweets in  hours before location tweet (excluding location tweet) and labeled these feature vectors with categories of the location visited by the user in location tweet.Those feature vectors having label   are denoted by the "positive" value of the class attribute in the binary data set and remaining feature vectors by "negative" value of the class attribute, as shown in Table 4. Hence, this function returns the binary data set   for category   .Then binary model   is built using returned data set   for category   and accuracy of models are calculated on validation data set to perceive the performance of model on given window size ().Similarly, model performance is evaluated on different window sizes and based on the best performance of model, respective window size of past tweets for that category is set.This optimum window size of category   is denoted by   .Once the window size of each category is determined, we use the respective window size in constructing feature vectors for that category.These feature vectors are used in training and prediction.
Figure 1 shows the block diagram of prediction of places of visit using past tweets.From past tweets we form feature vector FV(  ) using window size (  ) for th category determined in Algorithm 1. Then FV(  ) is used to infer the probability of visiting the category   .Please note that while predicting the probability of visiting category   , we use past tweets depending on   for   classifier at prediction time, as it is hypothesized that history of activity related tweets depends on activity to be performed.Therefore, this probability is denoted by (  |   ).For our purpose, we use Naive Bayes as binary classifier   in training and testing.

Considering Previous
Visited Place and Time.We consider two other important factors in predicting next place of visit that are previously visited place and amount of time before it is visited (i.e., time duration) along with the recent tweets.For example, predicting next place of visit as restaurant, after user has visited restaurant only, within an hour, is not considered as a good prediction.For our purpose, to track the duration of time between visits of next and previous place, we discretize the time into one-hour slots.We capture the human behavior of visit in a very simplified manner by building Markov chain for different time duration.Therefore, for each time duration, we form Markov chain and transition matrix.In transition matrix  ℎ , entry  ℎ (, ) represents the probability of visiting place  after place  between time interval [ℎ, ℎ + 1) hours.Let  ℎ (, ) be the number of times users' visited place  after place  between time interval [ℎ, ℎ + 1) hours; then We form transition matrix for each time duration (hours) in set {1, 3, 6, 12, 24}.The probability of visiting next category   after ℎ hours given previous visited location having categories     ⋅ ⋅ ⋅   (≡  ⋅⋅⋅ ) ( 1 is name of category whereas   is variable that can be any category.E.g.,     can be  22  3 .)is calculated as follows.Recall that visited location may have more than one category: Input: : Timelines of all users after labeling, as described in Section 2.  Considering independence assumption between previous visited categories we can write (2) as After simplifying, we can write (4) as where  ℎ (  |   ) are conditional probabilities estimated in (1) and (  ) is estimated as follows: (  ) represents the number of times user visited the category   in the training set.Equation ( 5) uses information of both factors that are previously visited place and how much time before it is visited by picking the appropriate transition matrix.We refer to this Markov modeling which considers time information also as MMs for the sake of brevity.
In order to predict the next place of visit using past tweets and amount of time before which categories are visited by user, we combine these two probabilities that are (  |  ⋅⋅⋅ ) (estimated in (5)) and (  |   ) (refer to Figure 1) under the assumption of independence.We refer to the combined classifier (  |   ,  ⋅⋅⋅ ) as CC for the sake of brevity.Consider As   and  ⋅⋅⋅ are assumed to be independent, therefore we can write (7) as follows: after simplifying ( 9)

Experiments
We have done exhaustive experiments on the data set crawled from Twitter using publicly available Twitter API.We have collected 4606 users' timelines that have tweeted at least once from New York between April 24, 2014, and April 29, 2014.Each timeline contains approximately 3200 recent tweets.Also, the tweet rate of these users is at least 20 tweets per day.Among 4606 users the number of location tweets on timelines is highly variable.Though we have started from New York the locations visited by users in our data set are around the world.

Data Sets.
For evaluation, we divide our data set into two subsets according to the number of locations on user's timeline.Therefore, among 4606 users, those who have more than 60 location tweets (tweet that has location information) on their Twitter timeline are considered in the first subset named Data Set 1 and the remaining users in the second subset named Data Set 2.

Evaluation.
First we present methods to compute the accuracies of proposed models one by one in the following subsections and then propose few baselines for comparing our proposed models.

Using Markov Models (MMs)
Only .We form transition matrix  ℎ from training data set, where ℎ ∈ {1, 3, 6, 12, 24} in hours, as explained in (1).By using the ground truth of test sets, we compute the accuracy of model as follows: (1) For every test instance, first we find out both time duration and categories the user has visited just before prediction.Here test instance is the location of a tweet which we want to predict.
(2) Duration is then discretized to integer value ℎ, such that if duration lies within time interval [ℎ, ℎ + 1) then transition matrix  ℎ is used for computation in step (3).
(3) If previously visited categories are  ⋅⋅⋅ , then by using  ℎ in (5) we estimate the probability of visiting each category present in training data set.
where  represent the number of test instances accurately classified and  represents total number of test instances.

Using Bank of Binary Classifiers (BBC)
. By using the ground truth of test sets, we compute the accuracy of model as follows: (1) For a given test instance (location tweet) we formed feature vector FV(  ) for each category   as described in Section 2.3.We compute the probability (  |   ) of visiting the category   by using the binary classifier   modeled in the training phase.
(2) By considering top  categories according to probabilities predicted by binary classifiers, if we recover the ground truth, then we take this test instance as correctly classified.The accuracy of this proposed classifier is calculated in the same way as mentioned in (11).

Using (𝑃(𝐶 | 𝑊, 𝐶 𝑎𝑏⋅⋅⋅𝑥 )) (CC).
The evaluation method is similar to evaluation method illustrated for BBC.For every test instance, we compute the probability of visiting each category by using (10).Considering top  categories in nonascending order, if we recover the ground truth, then we classify this test instance as correctly classified.Accuracies of this model are also calculated in similar fashion as mentioned earlier in (11).

Baselines.
Our proposed approach is generic and can be applied to both seen and unseen (new) users.As per our knowledge, there is no such literature available that predicts the future place of visit based on only recently used words and latest visited location without using user's demographics.Hence, for evaluating the performance of models, we proposed the following baselines.

Baseline Model 1 (BM1).
In this baseline, we have considered the most frequent check-in category in the training data.Let (  ) be the number of times users visited the category   in the training set.Then, where (  ) represent the number of times user has visited the category   .
Baseline Model 2 (BM2).Markov models are built for predicting the next place of visit based on the latest visited location, say,   , by the user.The probability of visiting   is estimated by computing a fraction of number of user's visits to   after   in total number of visits to any category after   .For example, the probability of visiting category   is estimated as follows: where (    ) represents the count of visiting categories   after visiting   in order.
Here, we want to mention that, in Section 2.4, we form Markov models (MMs) considering amount of time between consecutive visits.Therefore, if we use this baseline model to predict future place, we have only one transition matrix, but in MMs, we have different transition matrix depending upon the granularity of time.
Baseline Model 3 (BM3).As described in Section 2.3, we discuss the need of different window size for each category while predicting the future place of visit.This baseline is defined to show the importance of our approach for deriving window size for each category.For this, we use fixed window size for each category and show the performance against our approach used in Algorithm 1. Similar to BBC in Section 2.3, in this baseline, Naive Bayes is used as binary classifier   (see Figure 1), for training and testing.We use five different window sizes that are 5, 10, 20, 30, and 100 hours.

Results.
We conduct two sets of experiments.In the first set, we compare the performance of model BBC proposed in Section 2.3 with Baseline Model 3 (BM3).In the second set of experiments, we compare the performances of proposed models with Baseline Models 1 and 2.

𝑃(𝐶 | 𝑊) versus BM3. The objective of this set
of experiments is to show that BBC performs better when feature vectors are constructed by using window sizes derived by Algorithm 1 for training and testing.We want to recall that window size derived for category   may not be equal to category   , where  and  are some arbitrary category, as window size depends upon the performance of the binary classifier.
Acc@1 Acc@2 Acc@3 Acc@4 Acc@5 0 For this, first we show the performance of the classifier on same window size for each category (BM3).One by one, we use five different window sizes that are 5, 10, 20, 30, and 100 hours.Results are shown in Figure 2. Win represents the performance of BM3 when window size of past tweets is set to  hours.From these results, we observe that BM3 performs better when the size of a window is 10 hours for each category.Also, BM3 performs similarly on both test sets, that is, seen (TS1) and unseen (TS2), which validates that this classifier can be applied to wide variety of users.These results also support our hypothesis that words in tweets posted by users are highly correlated with the future activity or the location of that activity done by that user.By learning the relation of words and locations, BM3 infers the future location using words with high accuracy.
Results shown in Figure 2 validate the existence of the relationship between words and future location.By observing the text in a window, we found out that using same window (BM3) for all categories does not model the problem appropriately.If the window size is 5 hours then the words are very few which results in erroneous predictions.But if we increase the window size, then some words may come which have no role in predicting the current location.For example, tweet like "now looking for some fun" tweeted 30 hours before has no significance in deciding the current activity.From the above results, we found optimum window size is 10 hours for BM3.
For appropriate modeling, as discussed in Section 2.2, we derive window size for each category by using Algorithm 1 for training and testing.We compare these two approaches that are BM3 (10 hours) and ( | ).Results in Figure 3 show that we can enhance the accuracy of prediction by considering appropriate duration of tweets' window.

Comparison of Proposed Models with BM1 and BM2.
In this section, we compare the proposed models with BM1 and BM2.The performance of our proposed models outperforms both baselines as shown in Figure 4. We can see that MMs perform much better than BM2.The reason of this improvement is that when we use BM2 only, we have only one transition matrix with no time information in it.Thus, the only input to BM2 is previous visited category.But when we use MMs for prediction, we have two input parameters that are previous visited category and the time duration when it is Acc@1 Acc@2 Acc@3 Acc@4 Acc@5 0 visited.Based on the time duration we pick the corresponding transition matrix for computing predictions.The information of visiting preference according to time is lost in BM2, hence resulting in erroneous predictions.( | ) is explained in Section 3.3.1 in detail.From accuracies given by MMs and ( | ), we can infer that both recent tweets and previous visited category with duration have almost equal role in predicting next place of visit.But when both these model accuracies are combined under the independent assumption, the prediction rate improves significantly as seen in Figure 4. From this improved accuracy we can infer that the accuracy of proposed model can be enhanced by having more contextual information of the user which helps the model in understanding the user in better way.

Related Work
In last decade, researchers have used Twitter data for predicting various things in future.Bollen et al. [2] used tweets for predicting stock market index with high accuracy.Similarly Asur and Huberman [3] predict box-office revenues of movies in advance using related tweets.
Tweets are also explored heavily for inferring users interests for commercial purposes.References [4][5][6] have used different techniques over tweets for inferring user interest and show that tweets contain a lot of valuable information related to user interest.
For recommendation also people used Twitter data.For example, Sadilek et al. [7] used tweets to recommend those restaurants that user should not go.References [8][9][10] proposed a recommender system for news recommendations by modeling the user profile and exploiting the tweet-news relationship.
With the exponential growth in usage of smart phones, now users publish millions of tweets frequently from anywhere and at any time.Because of this, one other field emerged that studies the mobility prediction of users; that is, where the user will be?or where he was when given tweets were published?References [11][12][13] have shown that by using Twitter data we can predict the location of user with high accuracy.But the granularity level of predicting the user location using tweets is at either the country level or regional level.For example, Han et al. [14] predict user location, that is, country or region by identifying location indicative words (i.e., frequent word used at the specific location), in contrast to our approach where we predict locations such as shop, church, and restaurant, where granularity level is very deep.Some of the efforts have been made for predicting user location such as restaurant and shops, but the data used there are generated by Location Based Social Networks (LBSN).Using LBSN for prediction is very different from using Twitter, as Twitter data is very unstructured and challenging in comparison to structured data of LBSN.References [15][16][17][18][19] have explored LBSN (Foursquare) for recommending or predicting next location to the user.Some people used text published by users for deriving personality traits and based on common traits recommendations have been made.For deriving personality traits Linguist Inquiry and Word Count (LIWC) [20] had been used frequently.References [20][21][22][23][24][25][26][27][28][29] have shown that lexicons used by people can be used for understanding their personal values and how to use these traits for a recommendation.Though all these approaches have been used extensively in analyzing personality traits, these also have shortcomings of predefined word category correlation.Alternatively, Schwartz et al. [30] used rather different approach for using vocabulary known as open-vocabulary technique (using all set of words available on social media) in comparison to closed-vocabulary technique, Linguistic Inquiry and Word Count (LIWC) [20], where some predefined sets of words are used for deriving the personality traits.In their study, they have shown that by using the open-vocabulary approach they got higher state-ofthe-art accuracy in predicting gender in comparison to LIWC by exploring latent factors that are not captured by closedvocabulary approach.Motivated by this approach, we have used all words available on timelines for modeling the users' behavior.

Conclusion
In the present work, we study the problem of predicting next place of visit using tweets.We proposed a methodology for predicting next place of visit for user.For experiments, we crawled more than 4600 users' timelines from Twitter.For modeling and generating ground truths, we labeled those tweets with visited locations where relevant information is present.For labeling tweets, we propose simple pattern matching technique labeling tweets with high accuracy.We have also proposed an algorithm for deriving optimal length of window for each category for deriving feature vectors which is used in training and testing of BBC (bank of binary classifiers).From this trained model, BBC (Naive Bayes), we compute the probabilities of visiting next place.To account for the time of last places visited, we trained Markov models that also compute probabilities of visiting next places.Assuming the independence of probabilities produced by bank of binary classifiers and Markov models, we combined these two probabilities.From experiments, we found out that probabilities of visiting next place increased up to 80% from 65%.This shows that our model can be potentially used in location based advertising, intelligent resource allocation, and so forth.Also, similar performance of proposed model on both seen and unseen data sets make it applicable to wide variety of users.

Figure 1 :
Figure 1: Predicting probability of visiting category   given past tweets in window size   .

(a) Test Set 1 (
TS1): we use the remaining latest 50 location tweets of each user from Data Set 1 to form feature vectors.(b) Test Set 2 (TS2): we use all location tweets from Data Set 2 to form feature vectors.

Figure 4 :
Figure 4: Comparison of proposed models with baselines.

Table 1 :
Toy Dataset having five instances and four different categories.

Table 2 :
Four binary datasets constructed for each category respectively using dataset given in Table1.Here − denotes the negative samples.(a)Data Set  1 for category  1 1.Output: Window size of each th category that is   .(1)foreach category   in  do For determining the window size of past tweets use to construct feature vectors of each category.

Table 3
Training Data Set (TDS).From Data Set 1, we use oldest  − 50 location tweets of each user for training the binary models, where  is total number of location tweets available on their timeline.Tweets on timelines are ordered in reverse chronological order.Testing Data Set.For evaluating the performance of models on both seen and unseen users, we use two different sets for testing, named as Test Set 1 and Test Set 2.

Table 3 :
Division of users according to number of location tweets on their timelines.

Table 4 :
Description of training and testing data sets.