An Online-Traffic-Prediction Based Route Finding Mechanism for Smart City

Finding fastest driving routes is significant for the intelligent transportation system. While predicting the online traffic conditions of road segments entails a variety of challenges, it contributes much to travel time prediction accuracy. In this paper, we propose O-Sense, an innovative online-traffic-prediction based route finding mechanism, which organically utilizes large scale taxi GPS traces and environmental information. O-Sense firstly exploits a deep learning approach to process spatial and temporal taxi GPS traces shown in dynamic patterns. Meanwhile, we model the traffic flow state for a given road segment using a linear-chain conditional random field (CRF), a technique that well forecasts the temporal transformation if provided with further supplementary environmental resources. O-Sense then fuses previously obtained outputs with a dynamic weighted classifier and generates a better traffic condition vector for each road segment at different prediction time. Finally, we perform online route computing to find the fastest path connecting consecutive road segments in the route based on the vectors. Experimental results show that O-Sense can estimate the travel time for driving routes more accurately.


Introduction
With the development of intelligent transportation technology in smart cities, finding fast driving routes can be widely used for traffic flow coordinating [1] to optimize the plan of traffic management and urban computing [2][3][4][5]. It is beneficial to find fast driving routes, which is conducive to energy saving and traffic congestion coordination.
The existing fastest routes finding approaches usually look into the real-time traffic conditions or infer potential travel costs of road through mining historical trajectory data. However, these works mostly assume the static travel cost on each road, while ignoring the potential temporal dynamics. In fact, the driving time, calculated from the current travel time of each road segment, deviates much from the truth. The soundness of fastest route finding relates greatly to the prediction accuracy of travel time for vehicles on the driving road. Hence, it is necessary to predict the traffic conditions of road segments at an appropriate future moment.
The urban traffic prediction usually utilizes historical and current traffic flow information to predict road conditions for future moments [6]. Most existing methods present prediction trends via statistics of the current road timedependent evolution or they only work on spatial relationships between various segments without consideration of the temporal sequence and temporal-spatial information rules. Although available spatial-temporal information is combined to model the traffic network pattern in some previous works, this information does not play out its full potential. Traffic networks involve complicated relations in time and space. Specifically, traffic flow often contains high-dimensional, nonlinear, and nonstationary random data. In a previous work [7], a trained temporal-spatial deep learning approach DeepSense was proposed, which extracted features from the high-dimensional, nonlinear, and random traffic flow in large scale taxi GPS traces. However, we found that the prediction does not work so well at certain times, especially when there are insufficient taxi GPS traces. Based on this finding, it 2 International Journal of Distributed Sensor Networks is proper to consider more factors such as environmental information containing temporal changes that influence the traffic flow of current road.
In this paper, we propose an online-traffic-prediction based route finding mechanism for smart city, namely, O-Sense. It comprehensively utilizes both temporal-spatial dynamic pattern in transportation network and temporally related environmental information to predict the travel cost of each route. Firstly, we analyze temporal and spatial information of taxi GPS traces via Restricted Boltzmann Machine (RBM) since it can process this information into lowdimensional features. Then the extracted low-dimensional data is put into a support vector machine (SVM) to gain a robust classifier that makes promise for the following prediction. Secondly, a linear-chain CRF predicts the future traffic state by giving the previous state in a temporal sequence and supplementary temporally related observations, since CRF is well known for being applied to mutually independent road segments. With auxiliary data utilized by CRF, prediction accuracy can be further enhanced through a dynamic weighted classifier fusion. Finally, using the results calculated from the predicted traffic conditions of each road, our system can perform the fast route selection for users.
The major contributions of this paper can be summarized into the following aspects.
(1) We propose an online-traffic-prediction based route finding mechanism O-Sense, which innovatively adopts a trained temporal-spatial deep learning approach. Significant traffic features can be extracted from the information synthesized from highdimensional, nonlinear, and random large scale taxi GPS traces.
(2) Combining temporally related environmental resources which are utilized in CRF as supplementary and dynamic weighted classifier fusion, the prediction accuracy can be further improved. As a result, taking full advantage of temporal and spatial information for traffic forecasting, it can effectively reflect the dynamic pattern shown in transportation network.
(3) Utilizing large scale taxi GPS traces and environmental information in Wuhan, we evaluate the mechanism from a systematic perspective. Experimental results manifest that O-Sense mechanism can achieve sound and robust performance in prediction accuracy.
The remainder of this paper is organized as follows. Section 2 discusses related works. Section 3 presents the proposed O-Sense mechanism. Section 4 describes the processes of the traffic flow condition prediction. The performance of O-Sense is evaluated in Section 5. Finally, we make a conclusion in Section 6.

Related Works
2.1. Route Finding. Many works have been proposed for the fastest route finding problem, most of which can be divided into two categories based on the characteristics of the travel cost of road segments. On one hand, some works infer potential travel costs of road segments by exploiting historical trajectory data [8]. In [9], Tsuyoshi and Sugiyama treat the potential cost of each road as a spatial proximity-regularized trajectory regression problem. Gonzalez et al. find the fastest route based on the important driving and speed patterns which are learned from the historical trajectory data [10]. However, these works with the assumption of static travel costs cannot reflect the dynamic variances according to the actual traffic conditions. On the other hand, some works take the temporal dynamics of road travel costs into consideration to compute the fastest route [11]. Zheng and Ni learn the temporal dynamics of road travel costs, while it depends on integrated trajectories which cannot provide enough information when there is insufficient traffic information [12]. However, the pattern of the temporal dynamics of latent costs for road segments is hard to be predicted, especially in the cases with insufficient trajectory data.

Traffic Flow Condition Prediction.
Most of the existing traffic flow condition prediction approaches merely utilize time-dependent traffic flow evolution rule for prediction. In [13], the autoregressive integrated moving average model (ARIMA) simply relies on historical traffic flow data of forecasted point taking temporal variation into consideration for prediction. The traffic flow prediction model based on th -order Markov chain counts the transition probability of traffic flow state with historical data of forecasted road [14]. Coupled with current traffic situation and state transition probability, traffic condition in a period of time can be predicted. However, by using historical and real-time traffic patterns, the establishment of model in time sequence and the forecast duration is prolonged to some extent. Thus, there exists a lot of uncertainty and it is a statistical method. The variance-entropy-based clustering approaches [15] are used to estimate the travel time distribution in different time slots with historical data of object sites, which simply applies the idea of clustering analysis and generality.
There are also other methods that use spatial information in the transportation network to analyze the trend of traffic flow. Castro et al. [16] employ adjacent road information to forecast future traffic conditions, which does not take into account the influence of distant segments and temporalspatial correlation of predicted road. Markov logic networks [17] have also been adopted to predict the traffic conditions at simultaneous locations in different future time.
The methods mentioned above merely model traffic flow trends with respect to one aspect. Temporal-spatial information has not been made use of to depict the characteristics of nonlinearity and randomness in the flow from an angle of the whole network. A Bayesian network approach [18] intended for traffic flow forecast integrates information of adjacent links and its spatial-temporal information in a transportation network. Sun and Zhang develop a selective random subspace predictor (SRSP) model [19]   the given road link. Although considering spatial-temporal information, these approaches did not extract the characteristics of high-dimension, nonlinearity, and randomness effectively.

Preliminary
Definition 1 (route). A route R is a group of consecutive road segments, R: 1 → 2 → ⋅⋅⋅ → , where is the th road segment (1 ≤ ≤ ). In route R, the end point of one road segment is the start point of its directly succeeding road segment; that is, .end = +1 .start.
Definition 2 (taxi trajectory). A trajectory is a time series of GPS points for a trip, where there is a geospatial coordinate set and a timestamp at each consecutive point.
Definition 3 (road segment/link). A road link or segment is defined as a directed edge consisting of a direction symbol, two terminal points, and a length between crossroads.
Definition 4 (traffic flow condition). We select traffic flow to denote traffic state of road segments. Given different speed situations and flow limits on distinct road segments, classified traffic state based on absolute speed is obviously inaccurate. According to the degree of traffic congestion, we can categorize the traffic conditions into five states, represented by Ω = {congesting (Cg), slow (Sl), normal (Nm), moderate (Md), unimpeded (Un)}.
In detail, for the road segment R from 0 o' clock to 24 o' clock, a traffic flow is collected every time interval Δ (e.g., 15 minutes), thus obtaining a series of traffic flow observations O (e.g., 1 , 2 , 3 , . . . , 96 ). In order to categorize these series to prementioned five states, we adopted -means algorithm to cluster them into 5 subsets. By utilizing the average values of these 5 subsets, we can acquire their corresponding traffic states.

Framework.
In order to learn online road travel costs for selecting the fastest routes for users, we propose an onlinetraffic-prediction based route finding mechanism, namely, O-Sense. As shown in Figure 1, to implement our mechanism for fastest route finding entails three procedures, namely, preprocessing, comprehensive feature learning, and route computing.
3.2.1. Preprocessing. Spatial trajectories collected by GPSequipped vehicles are mapped onto a road network using a map-matching algorithm [20] and then stored into a taxi traces database. Environmental information is extracted to be used as temporally related features, as described in detail in Section 4.

Comprehensive Feature
Learning. This process contains three parts: temporal-spatial feature deep learning, temporally related supplementary feature learning, and dynamic weighted classifier fusion. Firstly, to learn temporal-spatial features, we apply deep learning approach with preextracted temporal-spatial traffic flow information from taxi traces database. The trained temporal-spatial features are used to train RBM model, which is capable of predicting irregular and stochastic factors in traffic systems. By means of this trained model, the dimensionality and redundancy of the input features are reduced to a proper extent. The extracted features can be more effectively classified by a classification engine SVM. This engine is designed to utilize the extracted  features to better train the expected prediction. Secondly, in order to learn supplementary temporally related features, CRF approach is trained to estimate the temporal state sequence of traffic flow with extracted environmental information. Therefore, the temporal state sequences obtained from CRF are matched for future traffic state prediction. The last technique adopted is dynamic weighted classifier fusion for comprehensively learning the whole temporalspatial information and temporally related features. After real-time traffic flow data is preprocessed and temporally related features of forecasted time are preextracted, we can get an eventual prediction result. Details are presented in Section 4.

Online Route Computing.
Given the prediction of each road segment at disparate time, a detailed route computing approach is performed in O-Sense to seek an optimal route. This approach utilizes the traffic condition of road segments in real-time for route computing. The process of online route computing is illustrated in Algorithm 1. In this paper, we consider the transportation networks exhibit the "FIFO" (First-In-First-Out) property; that is, if A and B visit node 1 at two different times 1 and 2, where 1 ≤ 2, then A always arrives at node 2 before B for any arc ( 1, 2). Our solution is composed of the following steps.
Step 1. To find the fastest route with giving a starting point, a destination, and departure time, O-Sense firstly continually predicts a group of traffic states for each road segment that connects two points every two minutes for the following specific time period (e.g., for the following two hours). As the traffic condition division is based on specific clustering method, each traffic condition state takes a speed value as the clustering center. Hence, we can utilize five speed values to represent five states and they form a speed vector. Utilizing the speed vector to represent predicted traffic states, the weights of each road segment at different moments are obtained.
Step 2. O-Sense chooses the optimal predicted value from the predicted speed vector of each road segment dynamically as the weight of corresponding road segment, which is closest to the actual value according to the time when the user arrives at the road segment. We can find the time-dependent fastest route using a modified Dijkstra algorithm [21,22].
The fastest path connecting consecutive road segments is eventually found as we arrive at the destination. Owning to the different time costs of road segments, if we start at different time at the starting of the route even though given the same start point and destination, we may find disparate fast routes.
International Journal of Distributed Sensor Networks Then turn to E with updating the labels. Keeping searching like this, the fastest route from A to G starting at current time ct is A → B → E → F → G. However, if starting at ct + lr , where lr denotes some time later than ct , the route A → B → C → F → G now becomes the fastest rough route since when we arrive at B, the time stamp at that time is becoming ct + lr + [ ct + lr ] and the travel time of going through is larger than that of ; that is, Hence, our approach utilizes the traffic condition of road segments in real-time (when actually driven) for route computing.

Comprehensive Feature Learning Based Prediction
This section details the methodology of the comprehensive feature learning based online traffic condition prediction mechanism, which consists of temporal-spatial feature deep learning, temporally related supplementary feature learning, and dynamic weighted classifier fusion-based comprehensive prediction.

Temporal-Spatial Traffic Feature Deep Learning.
To simplify and reduce the dimension of the input data in order to efficiently learn the significant traffic features, O-Sense employs the temporal-spatial feature deep learning approach which includes three main steps: feature preextraction, feature learning, and training a SVM model.

Feature Preextraction.
The correlation coefficient approach is first used to select the most correlated road segments between adjacent road segments and the predicted point, and then extracted features of these road segments are input into PCA to reduce data dimension and remove redundant information for better prediction.
Correlation Coefficient Approach. Considering samples { ( ), ( ct + )} ( , = 1, 2, . . . , ) in a spatial accessible space, ( ) ( = 1, 2, . . . , ) represents the traffic flow of road segment at time stamp and represents the mean value. And ( ct + ) stands for the traffic flow of road segment at predicted time ( ct + ) and stands for the mean value, where ct is the current time and means the duration of the prediction. The Pearson correlation coefficient ( ) between ( ) and ( ct + ) is defined as follows: International Journal of Distributed Sensor Networks It means higher correlation if the absolute value of spatial and temporal correlation coefficient is closer to 1. Then information of the most correlated road segments and corresponding time stamp will be chosen as extracted features, that is, six parameters (F m , c m , Δ mx , d mx , w m , l m ). F m , c m denote the average speed and the certain state of the th road segment at the corresponding time stamp. Δ mx and d mx denote the interval of the time stamp and geodistance between the predicted road segment and its th most correlated road segment, respectively. w m and l m denote the width and length of the th most correlated road segment.
Principle Component Analysis. After extracting features by correlation coefficient approach, Principle Component Analysis (PCA) is adopted to reduce data dimension and compress the data volume efficiently. The features of most correlated road segments construct the original data matrix. Consider where In order to overcome such problems as the disunity of the original data, the raw data is normalized as follows: where 2 ( ) = (1/5) ∑ 6 [ ( ) − ( )] 2 is the sample variance and ( ) = (1/6) ∑ 6 =1 ( ) is the sample mean. Then the covariance matrix of traffic flow in road segments can be constructed as follows: where ⟨ ⟩ = (1/ ) ∑ . All eigenvalues of the covariance matrix are found through the matrix computationand the sequence is 1 ≥ 2 ≥ ⋅ ⋅ ⋅ ≥ 6 ≥ 0.
As principle components contain most information, we only need to extract part of the main components. The contribution rate of each eigenvalue is The cumulative contribution rate of the preceding principle components is computed as ∑ . When this contribution rate is larger than a constant value , we only need to extract principle components which can reflect main information of all variables. Consider Finally the sample can be projected on the selected feature vectors and a × feature matrix is obtained for feature learning.

Feature
Learning with RBM. The main idea of deep learning approach RBM in O-Sense is that a plenty of preextracted temporal-spatial traffic data are processed into informative low-dimensional features, and then these learned features are used to train an efficient classifier SVM.
RBM is a generative stochastic neural network that can learn a probability distribution (e.g., (V, ℎ)) over the input set, which is good at reasoning about and predicting irregular and stochastic behavior in the traffic flow. As shown in Figure 3, RBM is in the shape of a bipartite graph with no intralayer connections. The hidden unit activations ℎ (lowdimensional data) are mutually independent given the visible unit activations V; that is, when constraints are given on V, all hidden units are conditionally independent; that is, (ℎ | V) = (ℎ 1 | V) ⋅ ⋅ ⋅ (ℎ | V), and vice versa. Given visible layer V, hidden layer ℎ can be obtained through (ℎ | V); meanwhile the value of units is gained by getting the hidden units (V | ℎ). Through adjusting the parameters, the visible layer V 1 obtained from hidden layer can approximately equal the original input layer V. From this perspective, the outputs of hidden units are another representation of the visible units; namely, the original high-dimensional traffic information has been transformed into low-dimensional representative data.
The standard type of RBM has binary-valued hidden and visible units and consists of a matrix of weights in which denotes the connection weight between hidden unit ℎ and visible unit V , denotes the bias of V , and denotes the bias of ℎ . Given parameters = ( , , ), the energy of a configuration (V, ℎ) is defined as This energy function is analogous to that of a Hopfield network [23]. As in general Boltzmann machines, probability distributions over hidden and visible vectors are defined in terms of the energy function. Consider where is a partition function defined as the sum of − (V,ℎ) over all possible configurations. Similarly, the marginal probability of a visible vector of Booleans is the sum over all possible hidden layer configurations. Consider  Then the probability under the condition that ℎ equals 1 and the probability under the condition that V equals 1 are given as follows: where ( ) = 1/(1 + − ) denotes the logistic sigmoid function and the parameters = ( , , ) can be learned by gradient-based contrastive divergence algorithm [24].

Temporally Related Supplementary Feature Learning.
Observing from the data distribution, there are enough taxi GPS traces at day, which is, however, insufficient at midnight. When traffic information is insufficient at some certain times, the deep learning approach which only utilizes big data for traffic condition prediction cannot make enough effect [25]. Through supplementary feature learning, environmental information relevant to traffic condition for each road segment can be well taken into account and then make the prediction become more profound and accurate. This approach basically involves three main processes, environmental feature extraction, CRF real-time estimation, and sequence segments matching.

Environmental Feature Extraction.
The traffic condition can be reflected by environmental information. Apparently, most of the road noise is emitted by vehicles. The observed patterns of noise can reflect under which traffic conditions it was produced [26,27]. A high rainfall intensity also influences the traffic condition [28]. Owning to the reduced visibility and pavement friction, the effective travel times are greater than the mean travel time as expected. In addition, the rainfall intensity will also influence the travelers' path choice.
If the effective travel time on one path is less sensitive to the rainfall as compared to other paths, most travelers choose or switch to the path with the probability of a higher rainfall intensity. A high wind speed results in a more hazardous trip and travel times are longer and less reliable, which leads to some travelers deferring trips or cancelling them [29]. The impact of temperature and PM2.5 is not very clear, but a good traffic condition is more likely when the temperature is high and PM2.5 is low.
When there is little traffic information at midnight, environmental information especially road noise can well reflect traffic condition. Road noise is mainly affected by traffic flow at night, which is not the same as during the day that contains interference factors. Hence, the sparse period of traffic information especially at midnight is quite suitable for being supplied with environmental information.

CRF Real-Time Estimation.
The characteristic mode and temporal evolution rule of the traffic flow state are investigated by using linear-chain CRF method [30], considering the supplementary features such as of the predicted road segment.
A linear-chain CRF is a discriminative probability undirected graphical learning model. The advantage of CRFs over hidden Markov models (HMM) is the relaxation of the independent assumptions between features. HMM is necessarily local in nature because they are constrained to binary transition and emission feature functions, which forces each state to depend only on the current label and each label to depend only on the previous label; however, CRF can use more global features. Additionally, CRF can obtain the global optimal value with the global normalization of all features. Building the special case of a linear-chain CRF, where ( −1 , , , ) is the transfer feature function depending on the current time and previous time − 1, ( , , ) represents the state feature function depending on the time , and denotes the traffic flow state at time . When meeting the known deterministic feature conditions, the value of or is 1, otherwise 0. and can be estimated from training data.
Assuming that there is a bunch of samples which are independent of each other in training dataset Data = { ( ) , ( ) }, maximum likelihood estimation learning ( | , ) is used to learn the parameters by gradient descent algorithm [31]. Consider Assigning each feature function a weight and given an observation sequence x, (12) can be transformed as follows: Through the above equation, the probabilities of five possible states for can be informally denoted as a fivedimensional vector. Consider

Sequence Segments Matching.
After applying CRF classifier to output a state vector of traffic flow at each time stamp, a state sequence segment can be obtained. Through finding similar state segments in history, we can obtain the state of traffic flow at the predicted time stamp according to the sequence matching with historical sequence segments. Since there may be no identical state sequence segments, measuring the similarity between sequence segments is necessary for sequence matching. Commonly used Euclidean distance is not suitable for the distance measurement of time-series sequences, which leads to the utilization of a nonlinear time alignment approach, namely, dynamic time warping (DTW).
In the process of state sequence segment matching, assuming that the current time is ct , a traffic state sequence segment obtained from CRF classifier previouslyis represented as = ( ct − ⋅Δ +1 , . . . , ct −Δ , ct ) and a predicted traffic state after time can be denoted as ct + . DTW-based segment matching algorithm, shown in Algorithm 2, is employed to procure adequate matching segments through measuring DTW distances between the given state sequence segment and historical segments * = ( * ct − ⋅Δ +1 , . . . , * ct −Δ , * ct ). In this algorithm, the predicted sequence segment is warped nonlinearly in time dimension to determine the similarity with historical segments. A constant threshold is defined to filter adequate traffic flow state segments in history according to the DTW distances which are less than the chosen threshold; that is, DTW ( ) In the above formula, * ( ) is the th adequate historical sequence segment and ( ) ( * = ) ( ∈ Ω) represents the probabilities of five possible states at the predicted time ct + according to the th historical segment * ( ) . Since ( ) is the weight of th historical segment and can be calculated as now, we can obtain the weighted vector at the predicted time ( = ct + ) as the final state result. Consider

Dynamic Weighted Classifier Fusion.
During the daytime, the taxi traffic operation characteristics, which provide ample data of arterial roads, can objectively reflect real road traffic conditions of the city from a certain extent. However, applying supplementary information especially road noise to predict the traffic condition may be affected by some interference factors such as man-made noise during the daytime. In the early morning hours, pure road noise is more likely to better reflect the road traffic conditions, which makes it possible to achieve higher prediction accuracy than what is achieved by insufficient taxi GPS traces. Obviously, two different approaches have various prediction accuracies during different time in a day. Comprehensively considering temporal-spatial features and supplementary features, a dynamic weighted classifier fusion is utilized to fuse two approaches to obtain a better result. We give scores to the prediction effects of two approaches every hour in one day, represented as score HR, CRF and score HR, DL , where HR represents the th hour in one day and denotes that th day in history. Sampling num times per hour, we compare predicted state with actual state , at each sampling in one hour. represents the prediction result of th sample in one hour that the time stamp belongs to, while , is the corresponding ground truth, which is accurately defined in evaluation part. If is congruent with the ground truth, a positive value considering the probability of this state is assigned to the score; otherwise the score is assigned negative. Consider score , HR, CRF where score , HR, CRF represents the score of th sample in the hour HR on day . The sum of all scores in the hour HR of day is calculated as follows: Similarly, we can also get the score , HR, DL and score HR, DL as follows: score , HR, DL When the expectation of scores in a given hour HR of all historical days is larger, the prediction will perform better. However, if the variance of scores in a given hour is larger, we will get worse prediction effect. Assuming that there are V days in the training dataset, prediction effects HR and HR in two respective approaches considering expectations and variances at the corresponding hour of V days in the history can be defined as follows: where score HR CRF and score HR DL denote the average values of scores in hour HR all V days in two different approaches.
In prediction process, we apply CRF approach and deep learning approach to learn features separately, generating two classifications with two probability vectors ( CRF and DL ) of 5-dimensional. The prediction result of O-Sense (donated as OS ) can be calculated by fusing CRF and DL with respective weights at time belonging to hour HR, which can be represented as follows:

Datasets.
The datasets of Wuhan city are used for the evaluation of our traffic flow prediction and route computing. A representative region is selected to verify the validity of our mechanism. The following three available datasets are used.
(1) Taxi Trajectories: the trajectory datasets are generated by 30,000 taxis over a period of three months from (2) Road Network: the road network of Wuhan is adopted to perform the experiments. In Figure 5, a snapshot of the Wuchang district road network in Wuhan is displayed in the rush hour (5 pm).
(3) Environmental data: from a public website, we collect environmental information which contains road noise, temperature, wind speed, PM2.5, and rainfall.
(4) Ground truth: the actual traffic flow which can be known from camera sensor on the road is used as the ground truth to measure the prediction accuracy.

Evaluation of Temporal-Spatial Deep Learning Approach.
Several parameters need to be determined for utilizing a temporal-spatial deep learning approach. Since different network structures influence the forecast results obviously, the parameter effect needs to be tested. One of the most important problems in designing a neural network is to determine the size of the network. 128 nodes are chosen to be the number of input layer nodes. As the number of input layer node changes, different weighted mean accuracy (WMA) can be obtained. As shown in Figure 6, the best choice is 64 nodes based on our experiments. If there are more nodes, training the RBM model will increase both the redundancy and computational expense. However, fewer nodes cannot make full use of neural network to depict the characteristics of high-dimension, nonlinearity, and randomness in the transportation network. As shown in Figure 7, road segment 3 chosen as the forecasted road segment denotes the traffic flow from upstream link to downstream road . As shown in Table 1  so forth. In this paper, we choose 5 min as the time unit and = 20.
In the PCA approach, we choose as 95 percent and 6 eigenvalues of the covariance matrix can be obtained as shown in Table 2.
After computing the cumulative contribution rate, we find that the cumulative contribution rate of the previous     Table 3.
To express the predicted state and its probability more clearly, we define five numbers for each state; for example,  with the probability , the predicted state value of the road segment is defined as − 1 + ; for example, when the predicted state of a road segment is Sl and the corresponding probability is 0.9 generated by deep learning approach, then the deep learningbased predicted state value is 1.9. Figure 8 shows the traffic state prediction of deep learning approach with prediction time = 15 min. There are two time slots (about 8 am and 6 pm) where the prediction values are more close to the actual values. It can be inferred that, in the rush hours, there are sufficient taxi trajectories to learn the states of traffic flow and produce a more accurate prediction. Temporal-spatial deep learning approach can effectively extract the temporal-spatial features of high-dimensional, nonlinear, and random traffic flow. However, due to the lack of the sufficient taxi GPS traces, the traffic prediction accuracy in the early morning hours (from 1:00 am to 5:00 am) is significantly lower than that during the daytime. With the increase of the vehicles at 6 am, the prediction accuracy improves gradually. It can be seen that the deep learning approach cannot train an effective network to extract features due to the sparsity problem of data. Figure 9 demonstrates the prediction result of temporally related CRF approach with the prediction time = 15min. According to the calculation, the prediction precision of CRF approach is nearly 0.75. Comparing the results of two approaches to predict separately in Figures 8 and 9, it can be inferred that the effect of deep learning approach is much better than CRF approach in most cases. However, temporally related CRF approach has better prediction effect on compensating for sparse data in deep learning approach. It takes full advantage of supplementary information especially road noise that provides effective information to improve the prediction accuracy when in the midnight.

Evaluation of Dynamic Weighted Classifier Fusion.
In the following experiments, Figure 10 presents the expectations and the standard deviations of scores in the same hour varied in these two approaches. Two observations can be found in Figure 10(a). First, deep learning approach in the daytime has good performance using ample temporal-spatial traffic features. Second, the performance of CRF is higher when there is sparse data in the traffic network. As shown in Figure 10(b), CRF approach performs unstably in daytime on the account of the supplementary features such as road noise mainly caused by many active sources rather than just the vehicles in daytime.
The dynamic weights of two different approaches in the same hour of one day are showed in Figure 11, which reveals that we can make more use of the advantages of two classifiers through fusing two approaches with the variety of time. Figure 12 presents better results obtained by fusion in different time of one day. For example, between 1 to 5 am in the morning, due to the sparse data in the traffic network, deep learning approach may have a low degree of recognition. However, CRF can obtain higher prediction precision during the period of time by extracting temporally related supplementary features. Hence, more accurate results can be gained by O-Sense which combines two approaches. Figure 13 shows the precision, the recall, and the F1measure of three approaches. As is shown in Figures 13(a) and 13(b), under the same prediction time, O-Sense can obtain 8% higher precision and recall than existing deep learning approaches. In Figure 13(c), the F1-measure, which considers both the precision and the recall, also reveals that O-Sense can obtain high accuracy. O-Sense comprehensively uses temporal-spatial traffic flow information and temporally related supplementary features in the traffic network. We can conclude that the effect of O-Sense is closer to the deep learning approach. Moreover, combining temporally related supplementary features used by CRF approach to overcome the problem of sparse data in deep learning, the prediction result of O-Sense can be further improved. Figure 14, two routes are selected to evaluate the performance of our mechanism. Route 1 starts from the Wuhan University to Hankou Railway Station and route 2 starts from Wuhan University to Guangda Masion. Route 1 has longer path length and costs much more time for travel than that of route 2, which can be seen in  am and 12 am on route 1 and route 2. O-Sense and STR can both relatively accurately estimate the travel cost of the routes at that time. Secondly, the prediction performance of O-Sense is much better than STR in the time period containing different traffic conditions on route 1 (about 8 am and 4:30 pm). It can be inferred that starting at 8 am in the morning, driving on route 1 needs more than one hour to complete the whole path. The traffic condition of route 1 varies from congestion to slow then to normal. Hence, the total travel cost of the whole route depends on the real-time traffic condition of road segments when the user actually drives on them. Nevertheless, STR approach considers that the traffic conditions of road segments are always the same as their initial traffic condition at the departure time. It leads to the estimation that the travel cost by STR approach is much longer than the ground truth. The situation at 4:30 pm is similar to the situation at 8 am, which can be seen as the estimated travel cost by STR is much shorter than ground truth. It can be inferred that the traffic condition on route 1 becomes congested after driving for about 30 minutes. The actual travel cost of the remaining path is much longer than that computed by the initial condition of each road segment in STR approach. Thirdly, route 2 takes less time than route 1 and the traffic conditions in that time period do not change too much; hence the STR performs better effects on route 2 at about 8 am and 4:30 pm. Figure 15 shows the variation of driving speed at different places when the participant drives along route 1. STR keeps the speed of each road segment the same as that of the starting time for its static estimation policy, while O-Sense can estimate the real-time speeds for all the places on the route based on the predicted arrival time. When the user drives from the downtown area at 8 am, the traffic condition is congested. After the user drives about 4/6 distance of the whole path, the traffic flow shows high variation over time and traffic condition on current road segment becomes unimpeded. The traffic condition can be estimated by both STR and O-Sense on account of the rush hour at the beginning of traveling. Nevertheless, with the driving distance increasing, the traffic condition of the driving area becomes unblocked. Owning to the online traffic condition prediction of current road segment, O-Sense can perceive this variation and estimate the travel cost more accurately.

Conclusions
This paper proposes an online-traffic-prediction based route finding mechanism, namely, O-Sense, for fastest route finding with large scale taxi GPS traces and environmental information. In O-Sense, a temporal-spatial deep learning approach with preprocessed traffic-related features can effectively extract nonlinear, random, and high-dimensional characteristics from the traffic flow changes. Utilizing environment features as supplementary, CRF classifier can further reflect the dynamic patterns in the transportation network. A dynamic weighted classifier fusion approach is used to obtain a better prediction result, which helps to find the fastest route based on online route computing. We adopt trajectory datasets generated by 30,000 taxis over a period of three months in Wuhan to evaluate our approach. The experiment results showed that O-Sense can improve the travel cost estimation accuracy effectively.