An Empirical Study on Evolution of the Connectivity for VANETs Based on Taxi GPS Traces

Network connectivity is a fundamental requirement for intervehicle communication and services in VANETs. The fast movement of the vehicles results in rapid changes in network topology generating dynamic variation in network connectivity, which causes dynamic variation in network connectivity. In this paper, we analyze the evolution of the VANETs connectivity based on the realistic data collected by the GPS devices installed on taxis in Shanghai. The study of the evolution characteristics shows that the number of connected components, the size of the giant component, and the connectivity length time series possess strong temporal correlation and long-range dependence. As a result, the short-range dependent processes such as the exponential, normal, or Poisson distributions cannot capture the statistical distribution characteristics. Moreover, all connectivity parameters cannot be described by unimodal distributions. We also conclude that although the whole topology of the VANETs is broken into a large number of small-sized connected components under small transmission radius, the size of the giant component keeps stable when transmission radius reaches the threshold. These results will be helpful to optimize the RSUs deployment and transmission radius to keep the giant component stable and design more efficient strategies to forward the information in VANETs.


Introduction
Vehicular Ad Hoc Networks (VANETs) are considered to be a special application of infrastructureless wireless Mobile Ad Hoc Networks (MANETs) [1].VANETs have some distinctive characteristics compared with MANETs like road pattern restrictions, large network size, variable network density, highly dynamic topology, infinite energy supply, localization functionality, high computational ability, and so on [2,3].Two communication types can be distinguished in VANETs: one is called Vehicle-to-Vehicle (V2V), in which vehicles can exchange information when they are located in the communication range of each other.The other one is called Vehicleto-Infrastructure (V2I) communications in which vehicles are connected to fixed stations located on the roadsides for information exchange or Internet access.The IEEE 802.11p working group is responsible for the standardization for V2V and V2I communications, while the entire communication protocol stack is being standardized by Wireless Access in Vehicular Environments (WAVE).
Connectivity is one of the most important properties for information dissemination in VANETs, since it represents the available paths for communication.In real applications of VANETs, vehicles move with different speeds and directions; hence, potential connectivity is constantly changing.Thus, network connectivity is a crucial research challenge for achieving VANETs applications.
There are many studies to analyze the connectivity of VANETs.These studies can be broadly divided into three categories.The first category is aimed at developing a modelbased theoretical analysis of network connectivity.With exponentially distributed intervehicle distances [4,5], use a queuing theoretic approach for the connectivity analysis.In [6], the authors have introduced a dynamic connectivity graph and applied it to analyze properties of Poisson network.Zhuang et al. [7] derive the exact expression for the average size of the connected components and their size distribution for one-dimensional VANETs.The authors in [8] use a new approach of using Exponential Random Geometric Graphs (ERGGs) to analyze the dynamic connectivity of VANETs.These theoretical analysis are useful in discussing the critical conditions or the connectivity boundary.But one of the main limitations of abovementioned works related to VANETs connectivity is that these theoretical models still rely on 2 International Journal of Distributed Sensor Networks strong assumptions, which are not consistent with real cases.For example, although an exponential distribution is a good approximation for the distribution of intervehicle distance, it is shown that assuming the intervehicle distance distribution is exponential will overestimate the connectivity probability by a significant amount [9].
The second category is a mobility model-based technique for analyzing network connectivity for VANETs.Conceic ¸ão et al. [10] contrast the connectivity profiles obtained from popular mobility models such as the random way point model and the Manhattan grid mobility model and reveal some of the deficiencies of the existing mobility tools for VANET research.Through simulations based on real bus routes in central London, Ivan et al. [11] study the connectivity of vehicular ad hoc networks that consist of buses moving in urban area.Hammouda et al. [12] analyze the mobility models' impact on the topological characteristics of VANETs by using different connectivity metrics such as lobby index, betweenness centrality, and local clustering coefficient.Although these mobility models based analyses are useful in evaluating the connectivity properties for VANETs, these artificial vehicular traces are still generated by mathematical models and cannot accurately represent real-world vehicular mobility traces.
The third category of connectivity analysis is based on validated microscopic mobility models [13][14][15][16][17] and realworld taxi GPS dataset [18].In [13], Pallis et al. perform a thorough analysis of the vehicular network topology in Zurich, Switzerland.They employ a Multiagent Microscopic Traffic Simulator (MMTS) to generate large vehicle trace data and then take a detailed study about the time-evolving connectivity.Applying realistic mobility traces, [14] conducts node-level and network-level analysis including connectivity, availability, reliability, and navigability of the network.Grzybek et al. [15] analyze the connectivity and the potential for creating communities in large-scale and realistic VANETs.Ferreira et al. [16] perform an urban connectivity analysis of VANETs through stereoscopic aerial photography based on a stereoscopic aerial survey over the city of Porto.Their results show significant differences compared to less realistic mobility models used for VANETs connectivity analysis.Glacet et al. [17] leverage an evolving graph-theoretical approach to investigation on the temporal connectivity of store-carryand-forward in vehicular networks.Their results show that even though it cannot allow full network connectivity, storecarry-and-forward improves significantly the instantaneous connectivity attained through simple connected forwarding.Reference [18] explores the features of connected components for vehicular network based on two real taxi GPS traces.The analysis results reveal that the whole topology of VANETs consists of a large number of small-sized connected components.They find that the biggest connected component of VANETs is relatively stable even though each individual vehicle keeps moving.
Although there are many research results on the connectivity of VANETs, there is lack of the research on the dynamic evolution of VANETs, which is the goal of this paper.Because the VANET is dynamic or evolving with vehicles and edges appearing and disappearing over time.In this paper, we present our main findings by studying the evolution of VANETs connectivity over time based on real taxi GPS traces collected from Shanghai, China.The remainder of this paper is organized as follows.In Section 2, the temporal network model and connectivity metrics are given.The real vehicle mobility traces are introduced in Section 3. The evolution of VANETs connectivity will be discussed in detail in Section 4. The last section is for a brief summary and discussion.

Methodology
2.1.Temporal Networks Model.We model the network topology at time  as a graph () = ((), ()), where vehicles (hereinafter also referred to as nodes) correspond to the set of vertices () = {V  } and communication links to the set of time-dependent edges () = {  ()}.We make the assumption that every node has the same signal transmission radius .An edge   () exists if the distance    () between every two nodes V  and V  at time  is less than or equal to ; that is, For evaluating the temporal evolution of the VANETs, its subgraphs are considered at annually divided time intervals.The dynamic networks discretize time by converting temporal information into a sequence of  network "snapshots."We use  to denote the time duration of each snapshot (or time window size), /, expressed in some time unit (e.g., seconds or hours) [19].In other words, a dynamic network can be represented as a series of static graphs (1), (2), . . ., ().The notation () (1 ≤  ≤ ) represents the th temporal snapshot of the dynamic network.Next, we will study the instantaneous connectivity of the network by considering each of such graphs separately and then analysis of the dynamic evolution of connectivity over time and provide valuable insight into VANETs topology characteristics.

Connected Component.
In graph theory, a connected component of an undirected graph is a subgraph in which there exists at least one path for any two vertices [14].We denote the shortest multihop communication path between nodes V  and V  at time  as the ordered sequence of nodes in the shortest path from We define the subgraph   () = (  (),   ()) as the component within which there are nodes V  at time .In other words, the component   () is the graph representing the portion of the network that vehicle V  can reach via multihop communication at time .Therefore, packets are forwarded if the source node and the destination node are in the same connected component.
The level of connectivity of the whole vehicular network is mainly characterized by the number of components which is an index of the level of network fragmentation.We define the set of the components present in the network at time  as C() = {  () |   () ∩   () = ⌀, ∀ < }.The number of components in the network at time  is then () = ‖C()‖.The average number of components is  = (1/) ∑  =1 ().The size of the component,   () = ‖  ()‖, is the number of nodes that belong to it.Then the connectivity probability () at time  is defined as () = max ∈()   ()/(); that is, the connectivity probability is the number of nodes in the giant component divided by the number of nodes in the whole network at time .The average connectivity probability is  = (1/) ∑  =1 ().

Real Vehicle Mobility Traces
The traces are captured by the Shanghai Taxi GPS System.The data includes the traces of 4316 taxis moving around in Shanghai within 24 hours.Each trace consists of a report periodically sent back to the data center via a GPS-enabled device onboard the taxi.Each report gives the vehicle's ID, coordinates (longitude and latitude) of the current location, timestamp, moving speed, direction, and operational status [21].Since taxi GPS devices are originally deployed to monitor and schedule taxis and the GPRS messages that carry the GPS data will be charged, taxis prefer to report the GPS data at a long time interval.Figure 1 exhibits a part of mobility traces of taxis in the urban area.
Although the report intervals are not regular, varying from 30 seconds to few minutes, the average value is about 60 seconds.In order to obtain fine-grained (one second) location information, one way is to use the linear interpolation method by assuming a constant speed of this object traveling between adjacent positions in the road network.We then use MATLAB to convert the longitude and latitude coordinates into plane rectangular coordinates by Gauss-Kruger projection [22].Figure 2 shows the comparison of a taxi's GPS raw data and fine-grained interpolation data converted by Gauss-Kruger projection.We find that only 2238 taxis are moving, and the rest of the taxis remain static for 24 hours.Therefore, we analyze evolution of VANETs connectivity using the traces of 2238 taxis.
The 75 MHz of Dedicated Short-Range Communications (DSRC) spectrum at 5.9 GHz has been allocated for vehicular communications in many countries.Due to the collisions and environmental interferences in the urban area, the real DSRC radius is considered as 300-500 m [23] for stable signal strength.However, according to the general specifications in current Vehicular Communication Systems, the DSRC-based devices on vehicles can achieve a maximum communication range of 1000 m [24].To explore the variance according to different communication radius, we choose five communication ranges from 100 m to 900 m.

The Evolution of VANETs Connectivity
The VANET evolves over time with vehicles and edges appearing and disappearing over time.The fast movement of the vehicles results in rapid changes in network topology generating dynamic variation in network connectivity.Four consecutive network topology graphs captured are shown in Figure 3, which depicts that VANETs topology changing affects the physical network connectivity.Each red spot denotes a vehicle and a blue line connecting two spots indicates that the two vehicles can communicate with each other when the signal transmission radius is 300 m.In this paper, we investigate the time evolution of network connectivity of the VANETs with a 30 seconds' granularity.

Connected Component.
The evolution of the number of connected components for different signal transmission radius  is shown in Figure 4, when all samples are over 24 hours.As expected, we observe that the number of components decreases as the signal transmission radius increases.When  = 100 m, the average number of components is 1909 and the average number of vehicles is 2222.This suggests that the vehicular network is highly partitioned into many separate node groups unable to communicate with each other.As shown in Figure 5, with the transmission radius increasing, the average number of components is reducing quickly.The results of the statistical analysis show clearly that their relationship can be well fitted by the power-law distribution function () =   +, where  = 2.049 + 004,  = −0.426, = −965.7,and the 95% confidence bounds of the three coefficients are (−2.661+ 004, 6.759 + 004), (−1.156, 0.3037), and (−4162, 2231), respectively.Moreover, the goodness of the fitting parameters are as follows: SSE: 8843, -square: 0.9956, adjusted -square: 0.9911, and RMSE: 66.49.It is often suggested that a more quantitative way to view time evolution is proposed to look at the autocorrelations of the number of components.The theoretical autocorrelation function (hereafter, ACF) is defined as follows: corr((), ( − )) = cov((), ( − ))/ √var(())√var(( − )), where cov((), ( − )) is the autocovariance and var(()) is the variance.Figure 6 are plots of the autocorrelation functions of the number of components.It is easy to see that the autocorrelation functions do exhibit a much slower decay behaviour except  = 100 m.We conclude that the number of components themselves do show the evidence of temporal correlations and long-range dependency.Therefore, the shortrange dependent processes such as the exponential, normal, or Poisson distributions cannot capture the distribution characteristics.
The short-range dependent processes cannot capture the distribution characteristics of the number of connected components.Which distribution function can describe the statistical characteristics of the number of connected components in general?We use kernel density estimation (KDE) methods to estimate the density of the number of connected components.From Figure 7, we can see that almost all the number of connected components exhibits new probability density function with several different shapes including bimodal shape except  = 100 m.In statistics, a bimodal distribution is a continuous probability distribution with mixture of two different unimodal distributions.Therefore, the number of connected components cannot be described by unimodal distributions.

Giant Component and Connectivity
Probability.Not only the availability but also the reliability of large components is a critical factor for the design of vehicular network solutions.Figure 8 illustrates the size of the giant component over time for different transmission radius.Figure 8 plots the average connectivity probability as a function of transmission radius.From Figures 8 and 9, we can see that almost no giant components appear during 24 hours when  = 100 m.Even when  = 300 m, not only the giant component is less probable to appear, but it never includes more than 18% of the network vehicles.The average  size of giant components of total vehicles is 6.71%.When the radius increases from 300 m to 700 m, the connectivity probability relies on linear growth.Conversely, when the transmission radius is larger than 700 m, the giant component keeps stable and even contains 70% to 80% of the vehicles.
Furthermore, we examine the evolution of the giant component over time.The autocorrelation function (ACF) of the size of giant component time series for different transmission radius is shown in Figure 10.Their autocorrelation functions for different  show a slowly decaying tail and remain significantly positive over many time lags.We conclude that the size of giant component exhibits strong temporal correlations and long-range dependency.We also use kernel density estimation methods to estimate the density of the size of giant component.From Figure 11, we can see that almost all the sizes of giant component exhibit new probability density function with several different shapes including bimodal shape.The size of giant component cannot be described by unimodal distributions.
Key Networking Insights.Comparing Figure 4 with Figure 5 and Figure 8 with Figure 9, although we can conclude that the whole topology of the VANETs is broken into a large number of small-sized connected components, the giant component can cover a large number of vehicles.Moreover, the giant component keeps stable when transmission radius reaches the threshold.Therefore, on the one hand, if the giant component keeps stable, we can make use of this feature by keeping important information on the vehicles in the giant component and design strategies to forward the information with the giant component.On the other hand, we consider that roadside unit (RSU) deployment is critical to achieve large components of the connected vehicles.In particular, RSUs should be deployed where weak ties tend to appear, so as to act as permanent bridges among small but stable connectivity islands.The problems about optimization of RSUs deployment and transmission radius will be developed in our future work.

Connectivity Length.
The connectivity length over time is reported in Figure 12.When the transmission radius  increases from 100 m to 900 m, the average connectivity length decreases sharply from 5108.9 to 13.5447.Figure 13 suggests that the connectivity length is strongly affected by transmission radius.The relationship between the average connectivity length and transmission radius can be quite well fitted by the power-law distribution function () =   as the coefficients  = 3.134 + 008 and  = −2.394 of which the 95% confidence bounds are (−8.856+007,7.153+ 008) and (−2.672, −2.116) correspondingly.And the goodness of the fitting parameters is as follows: 4703 (SSE), 0.9998 (-square), 0.9997 (adjusted -square), and 39.59 (RMSE).Though the transmission radius is large ( = 900 m), the average global efficiency is 0.0762.This means that the efficiency of the global communication in VAENTs is low.This result shows that VANET is not smallworld networks since the small-world networks have high global efficiency, that is, are very efficient in global communication.
Figure 14 is plots of the autocorrelation functions of the connectivity length.The strong temporal correlation of the connectivity length time series can be observed for any  in Figure 13.In particular, when the signal transmission radius is over 700 m, the global efficiency appears to be almost the same correlation coefficient.The slowly decaying autocorrelation function implies that the global efficiency possesses a strong temporal correlation and long-range dependence across various transmission radius.Moreover, the connectivity length also cannot be described by unimodal distributions.
We analyze the influence of various time windows on evolution of the connectivity parameters.The ACF of the connectivity parameters such as the number of connected components, the size of giant component, and the connectivity length is reported in Figures 15(a), 15(b), and 15(c), respectively, where we set the signal transmission radius as 300 m.As shown in Figure 15(a), with the time window increasing, the temporal correlation of the number of connected components is reducing quickly; that is, the smaller the time window is, the stronger the correlation is.Similar results are obtained for the size of giant component and the connectivity length as well.

Conclusion
In this paper, we analyzed the evolution of the connectivity for VANETs based on real taxi GPS traces in Shanghai.We found that the number of the connected component, the size of the giant component, and the connectivity length time series possess strong temporal correlation and long-range dependence.The short-range dependent processes such as the exponential, normal, or Poisson distributions cannot capture the statistical distribution characteristics.Moreover, all connectivity parameters cannot be described by unimodal distributions.The relationship between the average number of components and the transmission radius is similar to the relationship between average connectivity length and transmission radius in that it is described by a power-law function.We also found that, by adopting a reasonable communication range, the size of the giant component is stable.These results will be significant to optimize the RSUs deployment and transmission radius to keep the giant component stable and design more efficient strategies to forward the information in VANETs.
Our study still has limitations that open the way for our future research activities.First, our analysis is based on taxi GPS data collected in Shanghai urban area.Shanghai is the largest city by population in China.Compared with the city size, the taxi GPS data is relatively smaller.In our future work, we plan to investigate the connectivity evolution for networks where all the vehicles in Shanghai's urban areas will be contained by validated microscopic mobility models.Second, our connectivity analysis builds on a unit-disc representation of the radio signal propagation.Considering some model of signal fading would add the rapid variability induced by radio signal fluctuations on top of the connectivity dependent dynamics we observe in our study.Third, in this paper we only investigate the evolution of the instantaneous network connectivity.The temporal connectivity of vehicular delay-tolerant networks is very important for data dissemination.Therefore, we will provide

Figure 2 :XX
Figure 2: A comparison of a taxi's GPS raw data and fine-grained interpolation data.

Figure 4 :Figure 5 :
Figure 4: Temporal evolution of the number of connected components.

2 Figure 6 :
Figure 6: The ACF for the number of connected components.

Figure 7 :
Figure 7: The kernel density of the number of connected components time series.

Figure 8 :
Figure 8: Temporal evolution of the size of giant component.

Figure 9 :
Figure 9: The average connectivity probability versus transmission radius.

2 Figure 10 :
Figure 10: The ACF for the size of giant component.

Figure 11 :
Figure 11: The kernel density of the size of giant component time series.