A Kind of Urban Road Travel Time Forecasting Model with Loop Detectors

Urban road travel time is an important parameter to reflect the traffic flow state. Besides, it is one of the important parameters for the traffic management department to formulate guidance measures, provide traffic information service, and improve the efficiency of the detectors group. Therefore, it is crucial to improve the forecast accuracy of travel time in traffic management practice. Based on the analysis of the change-point and the ARIMA model, this paper constructs a model for the massive data collected by loop detectors to forecast travel time parameters. Firstly, the preprocessing algorithm for the data of loop detectors is given, and the calculating model of the travel time is studied. Secondly, a change-point detection algorithm is designed to classify the sequence of large number of travel time data items into several patterns. Then, this paper establishes a forecast model to forecast travel time in different patterns using the improved ARIMA model. At last, the model is verified by simulation and the verification results of several groups of examples show that the model has high accuracy and practicality.


Introduction
The travel time (TT) refers to the average time of all vehicles to pass a section of a road, as is shown in Figure 1. If means the time that vehicle ( = 1, 2, . . . , ) travels from detector to detector +1 ( = 1, 2, . . . , − 1), then the travel time of section [ 1 , ] can be defined as TT = ∑ =1 (∑ −1 =1 )/ . Urban road travel time is an important parameter to reflect the state of traffic flow of a road [1,2]. Based on the forecast information of travel time, the traveler can choose their travel route reasonably [3], and the traffic management department can establish impeccable guiding measures [4].
Thus, the precise forecast of travel time plays important role in improving the quality of urban traffic information service and the efficiency of detector group on the road [5], which has drawn great attention from scholars all the time.
Mori et al. give a thorough classification of the methods for travel time forecasting and they divide the forecasting model into naive model, traffic flow model, data model, and hybrid model [6]. Vlahogianni et al. give a short-term traffic forecasting method of where we are and where we are going [7]. Shao et al. give the method of real-time travel time forecasting based on the improved Kalman filter [8]. Chilukuri et al. forecast the short travel time of the highway by using microsimulation technique [9]. Yao and Zhang give the short-term forecasting algorithm of interval travel time for urban freeway by analyzing the floating car data, which provides the basis for the subsection forecast of the travel time [10]. Zhao et al. propose a forecasting algorithm based on equal interval interpolation and Sage-Husa adaptive Kalman filtering, which effectively improve the forecast accuracy of travel time [11]. Gui and Yu come up with a new idea for travel time forecast by establishing a forecast model with the selective forgetting ability, which enables the algorithm to adapt to trip conditions changes well [12]. The literatures which have been mentioned above provide some idea for this paper to forecast the travel time based on massive data collected from loop detectors. First, travel time and the traffic state of the road have a certain correlation. Besides, most of the travel time forecast models are using historical data for analysis. And the characteristics of traffic flow tend to change with the seasons and the environment in certain regularity. Thus, if the traffic flow can be divided into several state intervals, this means that the different intervals in the same pattern have the similar statistical characteristics of the mean and variance. As a result, it is easier to get the more optimized forecast result than to obtain it by using the global search.
Therefore, this paper proposes a travel time forecasting model based on change-point detection, which uses the change-point detection to identify different patterns of travel time series and set up the forecasting model by ARIMA in each of the patterns.
The rest of the content of this paper is summarized as follows: (1) Preprocessing of the massive data collected by the loop detectors and calculation of travel time parameter.

Identification and Correction of the Loop Detector's Data
Because of the reasons such as the detector fault, the fault of communication system, and the environmental factor, the real-time detector data contain some unpredictable data missing or invalid data. Therefore, it is necessary to preprocess the data collected by the traffic detectors [13]. So this paper gives the basic rules to identify and correct loop detector's data based on practical experience.
Basic Rule 1. When the data of traffic volume, speed, and occupancy rate is negative or null, it is recorded as the error data. When the data of volume is significantly greater compared to the maximum volume of road ( max = /60), it is recoded as the error data. When the data of speed is significantly greater compared to the maximum allowable speed or capacity of the urban road, it is recorded as the error data. When the data of occupancy rate is not less than 100%, it is recorded as error data.
Basic Rule 2. When the data of occupancy rate is greater than some reasonable threshold such as 95% and the data of speed is greater than the normal range such as 5 km/h, it is recorded as the error data. When the speed is zero and the volume is not zero, the data is the error data. When the volume is zero and the occupancy rate is not zero, the data is the error data. When the average effective vehicle length (AEVL (m) = (10 × V × ℎ)/ , V: speed ⟨km/h⟩; ℎ: occupancy rate ⟨%⟩; and : volume ⟨vehicle volume/(lane/hour) −1 ⟩), is beyond reasonable limits (such as AEVL ∈ [1.5 m, 30 m]), it is the error data.
Basic Rule 3. Each data item should be recorded similarly by piece of data in time before it. And the data should be done in first-order difference. If the difference value of firstorder does not belong to the reasonable change range made by data before it, this data can be defined as the abruptly changing distortion data.
The data collected by the detectors can be expressed as four-tuple structure, that is [ , , V, ℎ]. Based on the basic rules which have been discussed above, this paper proposes an algorithm to accomplish the real-time identification and correction of loop detector's data.

Algorithm 1.
Step 1. It is determining max , max , and max according to the actual situation of the detectors on the road. Test all of the data; if > max or V > max or ℎ > max , the data will be defined as error data.
Step 5. Calculate the average effective vehicle length (AEVL) according to the current detected data. If AVEL ∉ [1.5, 30], exclude this data.
Step 6. It is using the reasonable and nearest data to replace those error data which have been found in Steps 1-5.
Step 7. It is using the first-order differential operation to process data. If the difference value does not belong to the reasonable change range made by the differential mean value and variance of piece of data before it (such as [ − 0 * , + 0 * ]), this data can be defined as the abruptly changing distortion data. Then use −1 + (1/ 0 ) * to replace it.

Calculation of Travel Time Based on the Data from Loop Detector
Using the preprocessed data, the travel time parameter values can be calculated [14]. The calculation result of the travel time parameter is usually related to the speed of the vehicle on the road.
International Journal of Distributed Sensor Networks 3 Assume the speed is conformed to liner change and the upstream and downstream of each section of the road have a detector and each trip chain has multiple parts, so that V ( ), speed of the vehicle between detector and detector + 1, can be expressed as and this equation is a standard differential equation. Due to differential equations it is very difficult to obtain exact solutions; generally we need to seek an approximation to replace it. So ( ) can be obtained by formula (2) +1 , stand for the location of detectors + 1 and . ( , ) stands for the speed of detector in time period and also is the slope of the vehicle motion curve. ( ) stands for specific motion trajectories of time period within road section .
According to formulas (1), (2), and (3), the formula for time calculation of the road section based on the vehicle moving track can be divided into two situations: (1) When the speed is fast, consider the following: The approximate result for the travel time of the road section is (2) When the speed is slow, consider the following: This algorithm based on the method mentioned above can be summarized as follows.
Step 3. If ≥ , it means that the vehicle has arrived at the destination. Record the departure time and arrival time of the vehicle and then stop. Otherwise go back to Step 2 to recalculate the travel time.
This algorithm is firstly assuming the motion trajectory of the vehicle on the road and then through using the locationtime curve gets the time at which a vehicle runs out of the detector area to obtain the travel time [15]. This method of travel time estimation through time space motion trajectory has high accuracy. The error between the results of the calculation in [13,16] and the result of this algorithm is below 6%, which means that this algorithm's result is acceptable.

Forecast Model for Short-Term Travel Time
Based on ARIMA 4.1. Change-Point Searching. Because the traffic data has different numerical characteristics in different time periods, it can be divided into numbers of similar small states by conditional change-point searching, which can effectively improve the fitting degree of the model. In [17,18], a new algorithm for state division based on the demand variation of the observation function is introduced. The mean and the variance of the sequence can be expressed by statistical formula.
Whole sequence is Convex (concave) wave is Observation function is { ( ) } is the index set of all points on the curve and (0) = The algorithm is as follows.
The paper [18] has provided a complete method on how to improve this kind of algorithm. However, there has been a crucial control parameter 0 for which the algorithm does not give the specific processing formula. That algorithm sets 0 as a constant value such as 0.5. In this paper, we will carry out several experiments with different 0 , which provide reference to the parameter of travel time inferred from detectors' data.

ARIMA Forecasting
Model. The preprocessing of the time series short-term forecasting model includes stationary test and random test. If the time series is nonstationary, it needs to be transformed into stationary series by differential operation. In this circumstance, the ARIMA model is converted to ARMA model. The sequence of order differences is expressed as  The order number , of ARIMA ( , , ) model is based on Autocorrelation Coefficient (ACF) and Partial Autocorrelation Coefficient (PACF) of ARMA after differential operation. And, according to the characteristics of PACF and ACF coefficients, the model identification is carried out.
For random inspection, the data collected by the detector is a large density data point, so the calculation result of travel time is also a large sample of high density, which needs to test the hypothesis by using statistics: When is less than the quintile of 2 1− ( ), the sequence is pure random sequence. However, when the travel time is modeled by the ARIMA model, if the sample space becomes small, the modified LB statistics can be used: Because there is only a short-term significant correlation in the sequence, the test for the hypothesis is only for and LB with short-term delay-stage, which is generally less than 10.
After differential operation, the ARIMA model is degraded to the standard ARMA model; its standard form is In the formula, There are + + 2 unknown parameters in ARMA model: 1 , 2 , . . . , , 1 , 2 , . . . , , , 2 . Matrix estimation is usually used to obtain the value of and 2 .
Calculate the expectation and variance of formula (16) on each side and get The parameters of the equation are reduced to the number of + . Least square estimation is used to estimate the parameters of the ARMA model which is obtained by differential operation.
In the case of ARMA ( , ), the parameter vector is = ( 1 , . . . , , 1 , . . . , ) , Thus, overall observed sum of squared residuals of the sample is Set the objective function for parameter estimation as min (̃) and use the weighted least squares method to solve the parameters.
The essence of the weighted least square method is to transform the original data to obtain the new explanatory variables and explained variable. Assume that is time series data and is the weight of this point, so that weighted least square method is = ⋅ ( refers to travel time after transformation) .

(21)
Then, use ( 1 , . . . , ) to do the least square parameter estimation of ordinary ARIMA model; we can get the optimal parameter vector̃of the model in weighted transformation. In this way, the formula for the weighted forecast formula is After removing the weight, the final forecast value is obtained:

Preprocessing of the Massive Data from Loop Detectors.
This paper takes the actual data of 2nd ring road in a big city as an example (detector number is 020 * * ; line number is Lan 1-Lan 6; date is on Mar. 3rd, 2013; data collection time is 24 hours; sampling interval is 2 minutes; parameters are traffic volume, speed, and occupancy rate; and the total number of data points is 720) to verify the travel time forecasting model based on loop detectors which has been mentioned above. The actual data of Lan 1 is shown in Figure 2.
In Figure 2, mutation points can be observed in the data series of all the three parameters. In fact, the data of traffic conditions cannot change more than 500% times within two minutes. So it can be concluded that there are abnormal or distorted data in the actual data and it is necessary to filter those data.
According to Algorithm 1, we can finish the data cleaning. Firstly, according to the definition, the control parameters based on the basic traffic flow principle and the actual physical meaning are set up in Table 1.  Under the control of parameters listed in Table 1, we can finish the data cleaning to find out the data beyond the maximum control range or contrary to the theory of traffic flow. The result is shown in Figure 3.
Use a one-dimensional matrix to record the effectiveness of each record. All the initial value of the matrix is 1. When the abnormal data is detected, the corresponding matrix value is changed into 0. From Figure 3, there is a series of error data points at the time 3:00-6:00, which is consistent with the original graph shown in Figure 4.
Because these error data points do not have actual physical meaning, they are replaced by the closest normal record. After the cleaning, the figure of volume-speed-occupancy rate is shown in Figure 5.
Data quality has been improved to a certain extent, especially for the speed data. But there still has been mutation in the filtering results. Test the first-order differential of the data to determine the mutation data. The first-order difference graph of intermediate state is shown in Figure 6. Control parameter of differential operational is 0 . Assuming that 0 = 3 is the parameter of the reasonable change region, if the actual value of the first-order is exceeded three times of the standard deviation control range, change the corresponding position of the effective matrix into 0.
When 0 = 3, there are 83 abnormal data points as is shown in Figure 7, which account for 11.5% of the total record. This result is too strict for the data of detectors, so we can increase the value of 0 to release the strictness for change range of the data as is shown in Table 2.
According to the results shown in the table, 0 = 4 is more reasonable. Figure 8 is comparison chart between the final result of the filter and the original data, and it can be seen that the algorithm has basically achieved the requirements of the loop detector data's cleaning and preprocessing.
From Figure 3 it can be seen that the algorithm has a small correction for the traffic data and the data with the occupancy rate, but the algorithm has better effect on the speed data. In the process of predicting travel time, the speed of the detector is often used only, which means that the algorithm can be simplified so that it only needs to produce speed data.
Because only the speed data is processed, we need to set the upper and lower limit of speed and the limit of first-order differential change range to restrict data. As a result of using the detector data to calculate and predict travel time, we need three continuous detector's data points to simulate the travel time forecast in whole road network. The sketch map is shown in Figure 9.  Under the condition that 0 = 3, the filter result of speed after cleaning of the three detectors is as shown in Figures 10,11,and 12. As shown in Figures 10∼12, the red correction curve basically achieved a reasonable correction of the distortion data.

Calculation of Travel Time Parameters.
Use the travel time conversion model given by formulas (5) and (6); we can do the travel time conversion according to the speed data. For example, when one calculates the travel time driving from east to west, the vehicles pass through sections {Detector 1, Detector 2} and {Detector 2, Detector 3}. The result is shown in Figure 13 and the unit of time is s/m.

Pattern Partition of Travel Time Series Based on Change-Point Analysis.
Because it is not clear how to choose 0 up to the size of the sample, we select 7 values to search the changepoint. The result is in Table 3.
According to the characteristics of the travel time series, we need to select the result that has 5 to 10 change-points. So    Figure 15.    sequence, we found it is a nonrandom stationary sequence. The fix order result of the ARIMA model for the time of 13:28∼17:00 is (2, 0, 0). After the weighted least squares are transformed into ordinary least squares, we use four kinds of weight function to do the fitting experiment and also have the error analysis to the output results. The forecast results of different weight functions are as in Table 4.
Error analysis result is shown in Table 5 and we can see that the crucial index MAPE has a certain degree of reduction at the proximal point.
The forecast results of four different weighting functions can meet the basic requirements of the accuracy error of 10%. At the same time, we can know that the linear weight function has good fitting and forecasting effect on the experimental data. The linear weighting function is the optimal weight function for this forecast according to the statistics in Table 5.

Conclusion
This paper uses the change-point detection algorithm to divide travel time series into several patterns and set up forecasting model through ARIMA for different patterns based on massive data collected by the loop detectors on the roads. Different from traditional forecasting methods, it is easier to get the more optimized forecasting result than to obtain it by using the global search because the different intervals in same pattern have similar statistical characteristics of the mean and variance. In the process of dividing the travel time series, the calculation of algorithm is complicated and the derivation of control parameters is only obtained by experiments, which still needs research in the future. performed the data preprocess and revised the paper. Peng Zhang and Kang Song analyzed the data.